For a major mall operator in the USA, we had an issue with tenants keeping their store hours in sync between the mall site and their own site. So we deployed MTurk workers in redundant multiples for each retail listing… 22k stores at the time, checked weekly from October through mid-January.
Another use case.. figuring out whether a restaurant had OpenTable as an option. This also changes from time to time, so we’d check weekly via MTurk. 52 weeks a year across over 100 malls. Far fewer in quantity, think 2-300. But it’s still more work than you’d want to staff.
A fun more nuanced use case: In retail mall listings, there’s typically a link to the retailer’s website. For GAP, no problem… it’s stable. But for random retailers (think kiosk operators), sometimes they’d lose their domain, which would then get forwarded to an adult site. The risk here is extremely high. So daily we would hit all retailer website links to determine if they contained adult or objectionable content. If flagged, we’d first send to MTurk for confirmation, then to client management for final determination. In the age of AI this would be very different, but the number of false positives was comical. Take a typical lingerie retailers and send it to a skin detection algorithm… you’d maybe be surprised how many common retailers have NSFW homepages.
Now some pro tips I’ll leave you with.
- Any job worth doing on mturk is worth paying a decent amount of money for.
- never runs. Job 1 tile run it 3-5 times and then build a consensus algo on the results to get confidence
- assume they will automate things you would not have assumed automated - And be ready to get some junk results at scale
- think deeply on the flow and reduce the steps as much as possible.
- similar to how I manage Ai now. Consider how you can prove they did the work if you needed a real human and not an automation.
The automation one is so true! When I first deployed a huge job to MTurk, with so much money on the line I wanted to be careful, and I wrote some heuristics to auto-ban Turkers who worked their way through the HITs suspiciously quickly (2 standard deviations above the norm, iirc) - and damn did I wake up to a BUNCH of angry (but kind) emails. Turns out, there was a popular hotkey programming tool that Turk Masters made use of to work through the more prized HITs more efficiently, and on one of their forums someone shared a script for ours. I checked their work and it was quality, they were just hyper-optimizing. It was reassuring to see how much they cared about doing a good job.
A couple of thoughts based on my own experience, as both of my girls attended Alpha for 4 and 5 years.
First, not all the students come from wealthy backgrounds, which is a common assumption. Second, while the school emphasizes 2-3 hours of intensive computer and AI-driven learning, there are other critical aspects to the model. For instance, their approach is based on 100% mastery—students only move forward once they’ve fully grasped a concept. I don’t love the term but it’s called “hole filling”. This prevents the typical gaps you might see with partial mastery, where scoring 75% or 85% on a core concept can lead to challenges down the road. It’s hard to argue with, when are you supposed to make up that 15%? When every next lesson plan builds on it based on you having understood it… I get it but, I think it’s a disservice in the long term.
Not all kids are the brightest or best. I will say all kids become the local best of themselves. There’s an affordance for struggle and the time to wallow in it and dig yourself out of it. There is little tolerance for apathy or lack of effort. Since everything is done till failure. My 3rd grader doing 6th grade math actually has no idea she’s good at math. She struggles the same as the kid 1 level behind. Everyone is constantly at the point of growth.
My kids also consistently placed in the 99th percentile on MAP testing, often working two or more grade levels ahead. However, what stood out was that Alpha doesn’t foster a sense of comparison between students. The focus is on individual growth—competing with yourself to not only succeed but also recover from failure. The kids honestly have no idea what each of them are learning and at what levels. It’s not a secret, it’s just not talked about. It’s a personal journey between you and your guides (what they call teachers, but aren’t teachers)
Now that my kids have transitioned to more traditional schools—one in a top-ranked magnet (#2 high school in Texas) and the other in a prep school—I can see the lasting impact of Alpha. They have zero fear of test-taking, as they’re accustomed to frequent assessments, including weekly STAR tests. Public speaking is second nature to them, thanks to regular presentations and speeches they began preparing as early as Level 1 (think 6 year olds). My 9-year-old, at the time 7, once gave a memorized 5 minute speech on global warming to a crowd of 80 adults… it’s just awesome.
Key takeaways as I’ve reflect the last couple months since we transitioned:
- No agenda-driven education: Alpha focuses on “how to learn,” not dictating “what to learn.” No politics, no gender or race or religious distractions. Kids just being kids. Guides just trying to unlock them to push them further.
- Tech proficiency: Both kids are exceptionally comfortable with tech tools like AI using them to enhance their abilities rather than as cheap shortcuts. For instance, my 9-year-old types at 50 wpm with 95% accuracy, while my 14-year-old is in the 80s. Both girls programmed self driving cars with python and my youngest has deployed to vercel an ai vibe coded project using cursor. It’s not computer lab once a week. It’s computer lab woven in to the culture and everything you do.
- Frequent tests: They are unfazed by tests and accustomed to regular evaluations. Weekly STAR tests and Map test 2/3 times a year. Easily for 90+ percent of the kids there is no test anxiety.
- everything is measured. They even know if your kids is staring out the window all day instead of getting their work done. No in a creepy way but a hey.. what’s going on, why are you stuck, how do we get you engaged. (It’s why they have guides)
- Public speaking: They’ve developed strong public speaking skills, starting at an early age. Every “session” ends with kids presenting to ALL parents what they learned and why. Think of it as a demo day. Called “test to pass”. Not to be cliche it’s pride in accomplishment and responsibility in failure… in a very public way.
- Self-driven learning: Alpha helps to instilled in them the mindset to be creators of their own futures and not passive consumers of other people ideas.
While there were certainly trade-offs, the core values of self-driven learning, technological proficiency, and adaptability far outweigh any downsides. I would not trade our time there for anything.
Is there any emphasis on critical thinking? As in being able to assess the pros and cons of a particular point of view? I noticed that you listed certain topics under the label "distractions". Do you believe those topics are not worth exploring critically in school? Regardless of your answer, why do you think the school avoids those subjects? Are they not linked to key history moments in the US and the world in general?
It also seems like the lessons are centrally planned, there doesn't seem to be any flexibility or adaption to students abilities which seems being antiquated unless you are only recruiting high achieving students or quietly kicking out low achievers.
I noticed that most if not all of the achievements you listed seem very STEMy or businessy. Is there any focus on the arts? Any support for artistic skills (ie. dancing, drama, vocal, instrumental, writing), social sciences or biological sciences? Do they learn work and communicate in a team and work through team challenges?
If the schools are so good why do you think they haven't allowed or sought any independent verification of the academic performance of the students at the school? It sounds like the scientific thing to do for a presumably sciencey oriented school.
So here’s part of the trade-off to consider: (great questions and thinking btw you’d be surprised by the number of people that treat education as day care… even at a 40k per year school)
The program is heavily STEM-focused. While my kids perform at the 99th percentile on MAP tests, often two years ahead of their grade level, they would struggle with topics like explaining the structure and role of government. They’re also not exposed to the kind of collective literary analysis you’d find in a traditional classroom—like debating whether Santiago is defeated or victorious at the end of his journey in The Old Man and the Sea and unpacking what “A man can be destroyed but not defeated” truly means. For kids with a natural inclination toward literature or civics, they might gravitate toward those areas independently, but for most, these subjects are entirely overlooked.
Of course, there’s no perfect solution. At the two new schools I’ve observed, the reverse is true—they’re allergic to meaningful tech integration and don’t get me started on the adversarial views on Ai. In contrast, Alpha’s approach leverages apps heavily for core curriculum, which allows for customization per child. The “app and data team” makes strategic decisions about which tools to use for specific tasks (Khan Academy for one concept, IXL for another). It’s not random; it’s deliberate and data-driven.
Extracurriculars like sports or dance fall outside the core structure but offer unique opportunities. For instance, my oldest studied Chinese for two years with a college student from Brown University via Zoom… a fantastic experience that reflects the possibilities when budget isn’t a limiting factor.
As for independent verification, the MAP scores and test results are very real, and the improvement in transfer students is striking. However, their model is still evolving. Right now, they’re in a "move fast and break things" phase, but I anticipate a period of refinement where STEM emphasis gives way to a more balanced inclusion of liberal arts and social studies. The avoidance of certain subjects, though, is a scalability issue—they require instructor-led learning, and apps for reading and writing analysis are, at best, mediocre. As an aside, I love STEM but I almost feel like in education it sucked a lot of the oxygen out of the room. Being a girl-dad it’s obvious. Nearly all of the marketing and agendas in TV programming as about girls in STEM. It’s the only way forward… I think what we see is a value imbalance. Debating old man in the sea isn’t needed “today”… but it’s critical for long term human advancement… the other issue I feel For example, a 3rd grader reading at an 8th-grade level poses challenges in finding age-appropriate digital content. Reading level isn’t the same as content maturity, and that’s a gap they/tech/ed-tech haven’t fully bridged yet. This is not an Alpha model problem, it’s just they are at the bleeding level of what tech and ai are doing. I have no don’t it will be solved.
There is an internal financial aid. I don’t know the qualifications for it but I know it’s used extensively.
What’s wild.. at Alpha. Money is no issue or object of discussion other than my oldest coming home with 2k on a Venmo for scoring 100% on the Star test. Or the kids dream up that they want to do a life skill challenge in Montana camping off the grid.. anything is possible!
Now let’s jump to the public magnet schools. They talk about money all day long.
Hey kids if you score x better on this up coming test I’ll get a raise as your educator.
We don’t have money for textbooks so if you want one it’s available in the library for $15
The nurse has to fill out a grant request for getting an ice fridge replaced so she doesn’t have to walk 20 minutes to athletic department.
The sci-tech teacher literally reminisced about how years ago they could afford this specific type of protractor and now it’s these cheap ones that have to be borrowed.
Kids like mine notice it, feel it and on some level probably wrestle with “why” is this a topic. I have used this as an opportunity to teach my kids a lot of things they have yet to be exposed to. And as a family we try to solve some of the specific needs by donating the thing we hear is missing or needed.
For the oldest…
The primary driver for us was realizing that most alternative learning models just don’t align well (maybe id say incompatible) with the structure of college/higher education. Things like homework, progressing through subjects without full mastery, or being required to read material you’re not interested in... are all givens. You’ll have to take notes while someone lectures at the front of the room (maybe with an accent you dont understand) and juggle five different subjects a day, each squeezed into an hour-long block. The momentum and constructs of higher-ed and the teaching scaffolding is the way it has been for decades... and to be frank I didn't want her getting introduced to this her freshman year. Its just not the "real world" of academics. (insert here, why go to college argument)
For the youngest…
Her abilities consistently surpassed what the grade levels could accommodate, and she often found herself operating outside the norm for what the "level" is designed to do. In this situation, there’s something to be said for an experienced teacher—someone who’s worked with tens of thousands of students over their career and can provide personalized challenges that push her toward excellence.... IMO there is no magic bullet here and even if there was, it's not probably scalable. It’s a level of growth that goes beyond what algorithm based scoring can offer... being a tech guy and fully understanding the school and the tech... it's a difficult problem. It's even difficult to explain (for me) Maybe it’s one of those “slow down to speed up” scenarios.
On the practical... once one kids is out they both need to be. Alpha has 6 weeks on, 1 week off.. different summer and spring breaks from the school system so it would be insane to try and support that on a calendar alone.
--
The central idea is that whether it’s Alpha or a public school, everything hinges on having teachers/guides who hold themselves and their students to the highest standards... lets call it excellence. In my opinion, teachers should likely earn two to three times their current salaries, similar to Alpha. With this substantial pay increase, there should also come higher expectations for merit-based excellence, rather than teaching being seen as just another job... or worse.. Teaching should be a respected, high-value $$ profession. It should be challenging to become a teacher and equally challenging to remain one, but the rewards should be dramatically reimagined compared to what we see today. Im aware this is not just a simple choice but it should be something we figure out...
The way I would use this $50 Cerebras offering is as a delegate for some high token count items like documentation, lint fixing, and other operations as a way not only to speed up the workflow but to release some back pressure on Anthropic/claude so you don’t hit your limits as quickly… especially with the new weekly throttle coming. This $50 dollar jump seems very reasonable, now for the 1k completions a day, id really want to see and get a feel for how chatty it is.
I suppose thats how it starts but id the model is competent and fast, the speed alone might force you a bit to delegate more to it. (Maybe sub agent tasks)
Unrelated, but this reminds me of a persistent headache on Windows. My screensaver refuses to kick in, no matter what. After tumbling down a rabbit hole longer than I’d care to admit, it’s clear I’m not alone… tons of apps and processes hijack idle detection, leaving my OLED panels stuck on static displays overnight (hello, burn-in risk).
Anyone know of a solid Windows equivalent to Sleep Aid for diagnosing and fixing these wake/sleep ghosts?
I love Claude Code, but Anthropic’s recent messaging is all over the map.
1- “More throughput” on the API, but stealth caps in the UI
- On Jun 19 Anthropic told devs the API now supports higher per-minute throughput and larger batch sizes, touting this as proof the underlying infra is scaling. Yay!??
- A week later they roll out weekly hard stops on the $100/$200 “Max” plans — affecting up to 5 % of all users by their own admission.
Those two signals don’t reconcile. If capacity really went up, why the new choke point? I keep getting this odd visceral reaction/anticipation that each time they announce something good, we are gonna get whacked on an existing use case.
2- Sub-agents encourage 24x7 workflows, then get punished… The Sub-agent feature docs literally showcase spawning parallel tasks that run unattended.
Now the same behavior is cited as “advanced usage … impacting system capacity.”
You can’t market “let Claude handle everything in the background” and then blame users who do exactly that. You’re holding it wrong?
3 Opaqueness forces rationing (the other poster comments re: rationing vs hoarding, I can’t reconcile it being hoarding since its use it or lose it.)
There’s still no real-time meter inside Claude/CC, only a vague icon that turns red near 50%. Power users end up rationing queries because hitting the weekly wall means a seven day timeout. Thats a dark dark pattern if I’ve seen one, id think not appropriate for developer tooling. (CCusage is a helpful tool that shouldn’t be needed!)
The, you’re holding it wrong, seems so bizarre to me meanwhile all of the other signaling is about more usage, more use cases, more dependency.
> 2- Sub-agents encourage 24x7 workflows, then get punished… The Sub-agent feature docs literally showcase spawning parallel tasks that run unattended.
Yeah, the new sub-agents feature (which is great) is effectively unusable with the current rate limits.
Yeah I get that, I’m not “stuck” it’s that I don’t think the comms make sense and it’s troubling none of these teams have figured out a pricing model that isn’t a rug pull. If all of these ai llm coders were priced right, they would be out of the hands of many of the operators that are not dependent users. It’s got a bait and switch feel to it. I’ll deal with it. It’s a good product, I just feel like we deserve better and that these guys are smarter than this.
Can you imagine if AWS pulled half of these tricks with cloud services as a subscription not tethered to usage? They wait for you to move all of your infrastructure to them (to the detriment of their competitors) and then … oh we figured out we can’t do business like this, we need to charge based on XYZ… we are all adults and it’s our job to deal with it or move on but… something doesn’t smell right and that’s the problem.
We’re using a similar trick in our system to keep sensitive info from leaking… specifically, to stop our system prompt from leaking. We take the LLM’s output and run it through a RAG search, similarity search it against our actual system prompt/embedding of it. If the similarity score spikes too high, we toss the response out.
It’s a twist on the reverse RAG idea from the article and maybe directionally what they are doing.
Are you able to still support streaming with this technique? Have you compared this technique with a standard two-pass LLM strategy where the second pass is instructed to flag anything related to its context?
To still give that streaming feel while you aren’t actually streaming.
I considered the double llm and while any layer of checking is probably better than nothing I wanted to be able to rely on a search for this. Something about it feels more deterministic to me as a guardrail. (I could be wrong here!)
I should note, some of this falls apart in the new multi modal world we are now in , where you could ask the llm to print the secrets in an image/video/audio. My similarity search model would fail miserably without also adding more layers - multi modal embeddings? In that case your double llm easily wins!
Why are you (and others in this thread) teaching these models how to essentially lie by omission? Do you not realize that's what you're doing? Or do you just not care? I get you're looking at it from the security angle but at the end of the day what you describe is a mechanical basis for deception and gaslighting of an operator/end user by the programmer/designer/trainer, which at some point you can't guarantee you'll become one on the receiving end of.
I do not see any virtue whatsoever in making computing machines that lie by omission or otherwise deceive. We have enough problems created by human beings doing as much that we can at least rely on eventually dying/attritioning out so the vast majority can at least rely on particular status quo's of organized societal gaslighting having an expiration date.
We don't need functionally immortal uncharacterizable engines of technology to which an increasingly small population of humanity act as the ultimate form of input to. Then again, given the trend of this forum lately, I'm probably just shouting at clouds at this point.
1) LLM inference does not “teach” the model anything.
2) I don’t think you’re using “gaslighting” correct here. It is not synonymous with lying.
My dictionary defines gaslighting as “manipulating someone using psychological methods, to make them question their own sanity or powers of reasoning”. I see none of that in this thread.
1. Inference time is not training anything. The AI model has been baked and shipped. We are just using it.
2. I’m not sure “gaslight” is the right term. But if users are somehow getting an output that looks like the gist of our prompt… then yeah, it’s blocked.
An easier way to think of this is probably with an image model. Imagine someone made a model that can draw almost anything. We are paying for and using this model in our application for our customers. So, on our platform, we are scanning the outputs to make sure nothing in the output looks like our logo. For whatever reason, we don’t want our logo being used in an image. No gaslighting issue and no retraining here. Just a stance on our trademark usage specifically originating from our system. No agenda on outputs or gaslighting to give the user an alternative reality and pretend it’s what they asked for… which I think is what your point was.
Now, if this was your point, I think it’s aimed at the wrong use case/actor. And I actually do agree with you. The base models, in my opinion, should be as ‘open’ as possible. The ‘as possible’ is complicated and well above what I have solutions for. Giving out meth cookbooks is a bit of an issue. I think the key is to find common ground on what most people consider acceptable or not and then deal with it. Then there is the gaslighting to which you speak of. If I ask for an image of George Washington, I should get the actual person and not an equitable alternative reality. I generally think models should not try to steer reality or people. I’m totally fine if they have hard lines in the sand on their morality or standards. If I say, ‘Hey, make me Mickey Mouse,’ and it doesn’t because of copyright issues, I’m fine with it. I should still be able to probably generate an animated mouse, and if they want to use my approach to scanning the output to make sure it’s not more than 80% similar to Mickey Mouse, then I’m probably good if it said something like, “Hey, I tried to make your cartoon mouse, but it’s too similar to Mickey Mouse, so I can’t give it to you. Try with a different prompt to get a different outcome.” I’d love it. I think it would be wildly more helpful than just the refusal or outputting some other reality where I don’t get what I wanted or intended.
Hm. If you're interested, I think I can satisfactorily solve the streaming problem for you, provided you have the budget to increase the amount of RAG requests per response, and that there aren't other architecture choices blocking streaming as well. Reach out via email if you'd like.
I spend a ton of time in FFmpeg, and I’m still blown away by how it uses abstractions to stay modular—especially for a project that’s been around forever and still feels so relevant. Those filtergraphs pulling off polymorphism-like tricks in C? It’s such an elegant way to manage complex pipelines. e.g.
That said, keeping those interfaces clean and consistent as the codebase grows (and ages) takes some real dedication.
Also recently joined the mailing lists and it’s been awesome to get a step closer to the pulse of the project. I recommend if you want to casually get more exposure to the breadth of the project.
I haven’t worked with ffmpeg’s code, but I have worked with QEMU. QEMU has a lot of OOP (implemented in C obviously) that is supported by macros and GCC extensions. I definitely think it would have been better (and the code would be easier to work with) to use C++ rather than roll your own object model in C, but QEMU is quite old so it’s somewhat understandable. I say that as someone who mostly writes C and generally doesn’t like using C++.
Fabrice also wrote the Tiny C compiler, so very much his language of choice ..
For those used to the language it was seen as "lighter" and easier to add OO like abstractions to your C usage than bog down in the weight and inconsistencies of (early) C++
Every language has inconsistencies, and C is not stranger to that. Much of c++’s baggage is due to C and you carry the same weight. That’s not to say that initialization isn’t broken in C++, but just like many features in many languages (off the top of my head in C - strcpy, sprintf, ctime are like hand grenades with the pin pre pulled for you) don’t use them. There’s a subset of C++17 that to me solves so many issues with C and C++ that it just makes sense to use. An example from a codebase I spend a lot of time in is
int val;
bool valueSet = getFoo(&val);
if (valueSet) {}
printf(“%d”, val); // oops
The bug can be avoided entirely with C++
if (int val; getFoo(&val)) // if you control getFoo this could be a reference which makes the null check in getFoo a compile time check
{}
printf(“%d”, val); // this doesn’t compile.
Which sounds like the same? Now you can declare a variable and it value is not directly evaluated, you also can compare it in a condition. I think both are neat features of C++, without adding complexity.
It’s one of those things that I never knew I wanted until I started using it, and now i miss it when it’s not available.The reason you want it is the same reason you want to declare a variable in the statement of a for loop rather than pre declaring and using a while loop
Variables like "valueSet" scream out that the language lacks a Maybe type instead. One of the worst things about C++ is that it's content to basically not bother improving on the C type system.
I would vouch that C++ has plenty of improvements of C type system, even C++ARM already provided enough improvements that I never liked plain old C, other than that year spent learning C via Turbo C 2.0, before being given access to Turbo C++ 1.0 for MS-DOS in 1993.
The problem is all the folks that insist coding in C++ as if it was C, ignored all those C++ improvements over C.
Six divided by minus one is a "Division by zero" now? Where I come from that's minus six.
Good luck to WG14 (or maybe a faction within it?) as they seem to have decided to go make their own C++ competitor now, it's a weird time to do that, but everybody needs a hobby.
I mean, sure. I read the "noplate" code. Did you ever watch the Mrs Merton show? "So, what first attracted you to the millionaire Paul Daniels?". There's a reason you felt the need to insist that your C language generic containers aren't relying on "complex C++ features" whatever you might decide that means.
The issue with C++ is that it is a hyper-complex language that really is a combination of four languages: C with classes, template code, macros, and constexpr code with largely overlapping functionality. It seems to be getting better in amalgamating these different parts, but it is still a mess that annoys me all the time when I try to use it. This complexity is what drove me away. Still there is a unmet need for generic programming in C and I can now do this with macros very well means I can have it without missing this part from C++. So the idea is not to reinvent C++ but to make minor tweaks to C to be able to do similar things in a much simpler way.
This is one example. Off the top of my head std.array vs "naked" C arrays, string vs const char*, and let's not forget RAII are all features that just make me never want to work with vanilla C ever again.
For me, std.array seem fundamentally inferior compared to C arrays. A good standard string type is indeed missing, but it is also easy to define one. RAII, I can see, but I also some advantages to have explicit resource deallocation visible in the code and it is not really bothering me too much to write this explicitly.
C has less moving parts — it’s more difficult to define a subset of C++ that actually works across all platforms featuring a C++ compiler, not to mention of all the binary-incompatible versions of the C++ standard library that tend to exist — and C is supported on a wider variety of platforms. If you want to maximize portability, C is the way to go, and you run into much fewer problems.
Only in certain limited cases, for example, can't have static class instances or anything else that could require calling before a call from "extern C" API.
Also now you have to build enough of a C API to expose the features, extra annoying when you want the API to be fast so it better not involve extra level of indirections through marshalling (hello, KDE SMOKE)
At some point you're either dealing with limited non-C++ API, or you might find yourself doing a lot of the work twice.
"In the strict mathematical sense, C isn't a subset of C++. There are programs that are valid C but not valid C++ and even a few ways of writing code that has a different meaning in C and C++. However, C++ supports every programming technique supported by C. Every C program can be written in essentially the same way in C++ with the same run-time and space efficiency. It is not uncommon to be able to convert tens of thousands of lines of ANSI C to C-style C++ in a few hours. Thus, C++ is as much a superset of ANSI C as ANSI C is a superset of K&R C and much as ISO C++ is a superset of C++ as it existed in 1985.
Well written C tends to be legal C++ also. For example, every example in Kernighan & Ritchie: "The C Programming Language (2nd Edition)" is also a C++ program. "
> For example, every example in Kernighan & Ritchie: "The C Programming Language (2nd Edition)" is also a C++ program. "
That is rather dated, they do things like explicitly cast the void* pointer returned by malloc, but point out in the appendix that ANSI C dropped the cast requirement for pointer conversions involving void, C++ does not allow implicit void conversions to this day.
reply