Author here. It's fair enough. I didn't give real-world examples; that's partially down to what I typically work on. I usually work in brownfield backend logic in closed-source applications that don't showcase well.
Two recent production features:
1. *Quota crossing detection system*
- Complex business logic for billing infrastructure
- Detects when usage crosses configurable thresholds across multiple metric types
- Time: 4 days parallel work vs ~10 days focused without AI
The 3-attempt pattern was clear here:
- Attempt 1: DB trigger approach - wouldn't scale for our requirements
- Attempt 2: SQL detection but wrong interfaces, misunderstood counter vs gauge metrics
- Attempt 3: Correct abstraction after explaining how values are stored and consumed
2. *Sentry monitoring wrapper for cron jobs*
- Reusable component wrapping all cron jobs with monitoring
- Time: 1 day parallel vs 2 days focused
Nothing glamorous, but they are real-world examples of changes I've deployed to production quicker because of Claude.
Author here, quick clarification on pricing: the $1000-1500/month is for Teams/Enterprise with higher rate limits, not the consumer MAX plans. Consumer MAX ($200/month) works for lighter usage but hits limits quickly with parallel agents and large codebases.
For context: that's 1-2% of a senior engineer's fully loaded cost. The ROI is clear if it delivers even 10% productivity gain (we're seeing 2-3x on specific tasks).
You're right that many devs can start with MAX plans. The higher tier becomes necessary when running multiple parallel contexts and doing systematic exploration (the "3-attempt pattern" burns tokens fast).
I wouldn't be doing it if I didn't think it was value for money. I've always been a cost-conscious engineer who weighs cost/value, and with Claude, I am seeing the return.
(see link in the article to a study showing developers thought AI gave them a 20% gain in productivity, but measuring this showed they instead had a 20% loss)
Hi Ale, author here. Skepticism is understandable, but trust me, I'm not just writing React boilerplate or refactoring.
I find it difficult to include examples because a lot of my work is boring backend work on existing closed-source applications. It's hard to share, but I'll give it a go with a few examples :)
----
First example: Our quota detection system (shipped last month) handles configurable threshold detection across billing metrics. The business logic is non-trivial, distinguishing counter vs gauge metrics, handling multiple consumers, and efficient SQL queries across time windows.
Claude's evolution:
- First pass: Completely wrong approach (DB triggers)
- Second pass: Right direction, wrong abstraction
- Third pass: Working implementation, we could iterate on
----
Second example: Sentry monitoring wrapper for cron jobs, a reusable component to help us observe our cronjob usage
Claude's evolution:
- First pass: Hard-coded the integration into each cron job, a maintainability nightmare.
- Second pass: Using a wrapper, but the config is all wrong
- Third pass: Again, OK implementation, we can iterate on it
----
The "80%" isn't about line count; it's about Claude handling the exploration space while I focus on architectural decisions. I still own every line that ships, but I'm reviewing and directing rather than typing.
This isn't writing boilerplate, it's core billing infrastructure. The difference is that Claude is treated like a very fast junior who needs clear boundaries rather than expecting senior-level architecture decisions.
I've considered live-streaming my work a few times, but all my work is on closed-source backend applications with sensitive code and data. If I ever get to work on an open-source product, I'll ask about live-streaming it. I think it would be a fun experience.
Although I cannot show the live stream or the code, I am writing and deploying production code for a brownfield project.
Two recent production features:
1. Quota crossing detection system for billable metrics
- Complex business logic for billing infrastructure
- Detects when usage crosses configurable thresholds across multiple metric types
- Time: 4 days while working on other smaller tasks in parallel work vs probably 10 days focused without AI
2. Sentry monitoring wrapper for metering cron jobs
- Reusable component wrapping all cron jobs with Sentry monitoring capabilities
- Time: 1 day parallelled with other tasks vs 2 days focused
As you can probably tell, my work is not glamorous :D. It's all the head-scratching backend work, extending the existing system with more capabilities or to make it more robust.
I agree there is a lot of hand-holding required, but I'm betting on the systems getting better as time goes on. We are only two years into this AI journey, and the capabilities will most likely improve over the next few years.
Two recent production features:
1. *Quota crossing detection system* - Complex business logic for billing infrastructure - Detects when usage crosses configurable thresholds across multiple metric types - Time: 4 days parallel work vs ~10 days focused without AI
2. *Sentry monitoring wrapper for cron jobs* - Reusable component wrapping all cron jobs with monitoring - Time: 1 day parallel vs 2 days focusedNothing glamorous, but they are real-world examples of changes I've deployed to production quicker because of Claude.