Great write up, I enjoyed the reading the explanations for each piece and found them to be clear and quite thorough.
I did make the mistake though of clicking "+ expand source", and after seeing the (remarkable) abomination I can sympathize with ChatGPT's "SQL is not suitable for implementing large language model..." :)
Please correct me if I am mistaken; also R2 is not a CDN, and more like s3 in terms of delivering from the edge
No issue with your comment specifically, just wondering if you know. currently using s3+cloudfront for mp4 storage+delivery, and would like to move to something better if possible.
Unless you are an Enterprise customer, Cloudflare offers specific Paid Services (e.g., the Developer Platform, Images, and Stream) that you must use in order to serve video and other large files via the CDN
The Cloudflare Developer Platform consists of the following Services: (i) Cloudflare Workers, a Service that permits developers to deploy and run encapsulated versions of their proprietary software source code (each a “Workers Script”) on Cloudflare’s edge servers; (ii) Cloudflare Pages, a JAMstack platform for frontend developers to collaborate and deploy websites; (iii) Cloudflare Queues, a managed message queuing service; (iv) Workers KV, D1, Durable Objects, Vectorize, Hyperdrive, and R2, storage offerings used to serve HTML and non-HTML content; and (v) Workers AI, a Service that allows customers to use Cloudflare’s inference infrastructure to invoke select third-party machine learning models (subject to applicable open source licenses or third-party terms of use for such models).
Based on their terms, serving video in R2 should be fine.
> Please correct me if I am mistaken; also R2 is not a CDN, and more like s3 in terms of delivering from the edge
Yes, it uses Cloudflare's edge network and caching infrastructure when you use a custom domain (really the only option because usage of the default domain is very limited). So yes, it's a CDN :-)
> currently using s3+cloudfront for mp4 storage+delivery, and would like to move to something better if possible.
What I have found is that (A) the egress costs are extreme compared to R2 (remember, R2 does charge per read/write though) but (B) storage is cheaper on S3. You should do a study on your storage and egress, but unless you are storing way more than your egress TB, it's probably a good deal.
Cloudflare also has a mechanism for transparently pulling content from an upstream S3 compatible store into R2. I think it's called Super Slurper iirc
I think given the cost advantage for s3 for storage, it seems almost better to pull from R2 into s3 for long-term storage (some inverse slurper).
It's good to hear though that R2 can singlehandedly match s3+cloudfront; that being said, video delivery is a bit different I'd imagine, even s3+cloudfront is finnicky with range requests etc.
I know they have team(s) of very smart people dedicated to solving this issue (at least at the individual level).
So assuming they care, I can think of two main reasons as to why it is not solved yet, both related to scale as Marques mentioned:
1.) Scale of the problem - It might be that they are already catching 99% of the stuff and we just see what falls through the cracks
2.) Scale of the solving - It could be that the teams and infrastructure are so large that they can't make the rapid adjustments needed compete in such an arms race
On a separate note, I imagine a higher quality comment section would increase engagement more than any "appealing" scam.
> I know they have team(s) of very smart people dedicated to solving this issue (at least at the individual level).
Do you actually know, or are you being generous and still trying to assume good faith from a company that disproved it several times?
I don't see a business reason for them to take action. The spam comments don't open them to any legal liability (they already get away with much worse), YouTube has a monopoly so no amounts of spam will drive users away, the spam contributes to engagement numbers and the advertisers don't seem to mind.
I happen to know someone in this case, and am not assuming good faith from the company by any means. I trust and respect the individual.
I'm also generally interested in the comment moderation problem myself, and have been working on it myself for some time. I guess my judgement is clouded by my hope that there is a reasonable excuse for the team(s) at Google to not have solved it by now.
Perhaps it is naive of me to think this way; if it really is as simple as "this does not affect advertising revenue" then that would be quite nearsighted of Google. And, as I mentioned earlier, I am of the opinion that quality comment sections would increase engagement (and revenue as a result), so it doesn't make sense to me.
The spam comments usually contain hooks and symbols so that the other bots can latch onto them much easier. Querying for those signs in order to spot possible spam comment threads, with high probability, is trivial, especially considering the already existing libraries on the topic, for instance within Bayesian probability and statistics.
Sure, the most hardcore spammers would most likely change tac if thus attacked, but many would also quit entirely as it become unprofitable to spam. If they also were to train one of their AIs or neural networks, they can catch even more spam by simply looking for post and sentence patterns. For instance it's very common that a spam thread contains multiple references to a name; the name of the brand or investor, or whoever they are shilling. They're always giving some sort of advice in conjunction with that name. And at some point the posts most certainly contain weird symbols to reference the WhatsApp number or Telegram channel. So no, I don't buy that this is hard to do. I think most of it is trivial.
So why aren't they fixing it? Well, I seriously doubt it's due to incompetence. The more likely scenario is because they through earnings and statistics already know that it's not losing them any paying customers. As such it's simply a matter of priority for them. And you're not it. Because you're the product, not the customer.
> Well, I seriously doubt it's due to incompetence.
I agree with you on that, as well as taking an ML approach.
Querying the hooks and symbols directly can lead to the false positive vs spam tradeoff that TheDong is referring to elsewhere in this comment section (to be fair, so can the ML approach but its more avoidable).
It is possible that the scale of it makes the minor shortcomings not so minor.
I used to watch some of Zizek's videos, though I hadn't seen the one that was linked until now. I'm no student of philosophy though, just a casual viewer so I don't think I can help much in answering your questions. I will say that he is definitely using uncommon definitions of the words "love" and "evil" as far as I can tell. As to who is this guy, I personally would recommend this video if you are trying to get a better sense of his thoughts as it is relatively clear and covers a large number of stances (the analogy at 7:15 is quite amusing): https://www.youtube.com/watch?v=_x0eyNkNpL0
On a side note, I am pleased though that you find him entertaining to watch regardless. I find it disappointing when people take speakers like Zizek too seriously; I think even if 80% of what is said is not insightful/intelligible, the occasional grain of truth can be very much worth it.
I hadn't heard of uTok before, I'll add that to my reading list as well!
Sounds like they were doing something similar in style to the lyrics site Genius's attempt mentioned elsewhere in this thread.
We're committed to maintaining civil discussion, and you can get a sense of what we're on watch for by checking out our content policy: https://www.netvyne.com/content-policy
The platform has no favorites, which you'll see for yourself in time.
I did make the mistake though of clicking "+ expand source", and after seeing the (remarkable) abomination I can sympathize with ChatGPT's "SQL is not suitable for implementing large language model..." :)