> But now that most code is written by LLMs, it's as "hard" for the LLM to write Python as it is to write Rust/Go
The LLM still benefits from the abstraction provided by Python (fewer tokens and less cognitive load). I could see a pipeline working where one model writes in Python or so, then another model is tasked to compile it into a more performant language
It's very good (in our experience, YMMV of course) when/llm write prototype with python and then port automatically 1-1 to Rust for perf. We write prototypes in JS and Python and then it gets auto ported to Rust and we have been doing this for about 1 year for all our projects where it makes sense; in the past months it has been incredibly good with claude code; it is absolutely automatic; we run it in a loop until all (many handwritten in the original language) tests succeed.
Sorry, so basically you're saying there are two separate guidelines, one for Python and one for Rust, and you have the LLM write it first in Python and then Rust. But I still don't understand why it would be any better than writing the code in Rust in one go? Why "priming" it in Python would improve the result in any way?
Also, what happens when bug fixes are needed? Again first in Py and then in Rs?
Sorry, yes. LLMs write code that's then checked by human reviewers. Maybe it will be checked less in the future. But I'm not seeing fully-autonomous AI on the horizon.
At that point, the legibility and prevalence of humans who can read the code becomes almost more important than which language the machine "prefers."
Well, verification is easier than creation (i.e., P ≠ NP). I think humans who can quickly verify something works will be in more demand than those who know how to write it. Even better: Since LLMs aren't as creative as humans (in-distribution thinking), test-writers will be in more demand (out-of-distribution thinkers). Both of these mean that humans will still be needed, but for other reasons.
The experienced generalists with techniques of verification testing are the winners [0] in this.
But one thing you cannot do, is openly admit or to be found out to say something like: "I don't know a single line of Rust/Go/Typescript/$LANG code but I used an AI to do all of it" and the system breaks down and you can't fix it.
It would be quite difficult to take a SWE seriously that prides themselves in having zero understanding and experience of building production systems and runs the risk of losing the company time and money.
This is fair, but this seems like the only way to test this type of thing while avoiding the risk of harassing tons of farmers with AI emails. In the end, the performance will be judged on how much of a human harness is given
People will pay extra for Opus over Sonnet and often describe the $200 Max plan as cheap because of the time it saves. Paying for a somewhat better harness follows the same logic
The game looks really good, although I think it'd be improved if the sphere was a bit smaller. It feels like it takes too long for the game to become difficult
Because I learned JS before ECMAScript 6 was widely supported by browsers and haven't written a ton of it targeting modern browsers. You're right that it's unnecessary.
Easy up to ~70, interesting between 80-110, very hard around 120-130. I think scores above 200 are pretty sus, there is very little room on the sphere at that point (using the cheat from sibling comment). Anything >400 is definitely made up.
A few months ago, there was a lot of news lambasting tech companies for extending the depreciation lifespan of GPUs from ~3 years to ~5 years. Do these price hikes suggest a longer lifespan is probably the right way to see how long these GPUs will be valuable?
Not a finance guy, so fully prepared to be wrong here. But my interpretation is that an increase in price corresponds to a shorter lifespan. i.e. less time to make money so we need to charge more to get the same return over less time
It could also be a supply/demand issue, generally price increases are caused by either 1. demand increasing, or 2. supply decreasing.
In this case we can interpret a shorter lifespan as decreased supply, but it can also be because the demand for GPU compute has gone up. I think in this case we're seeing a bit of both, but it's hard to tell without more data.
We could also consider the supply / demand elasticity changing, f.x since demand has become more price inelastic it could result in a higher price.
The thing historically about GPUs has not been the actual lifespan of the hardware (at least half of the hardware will probably work fine for 10 or more years) the problem is that work/watt is dropping for newer hardware, so there's a point where even if you had an equivalent quantity of 10-year-old GPUs, powering them for some period costs $40k and you can buy a single brand-new GPU that costs $40k but only costs $20K to power for the same period which is less than a few years.
I don't think we're seeing any decrease in supply though, ignoring 2020 I'm pretty sure the number of GPUs manufactured has been steadily increasing. It might be the case that projected manufacturing was higher than what actually happened, which is not the same thing as a decrease in supply, but companies like Amazon will talk about it like it is, and from the standpoint of their pricing it essentially is.
> the problem is that work/watt is dropping for newer hardware, so there's a point where even if you had an equivalent quantity of 10-year-old GPUs, powering them for some period costs $40k
Sell the old-gen GPU's to on-prem users (including home consumers) who are going to run them a small % of the time (so power use is more or less negligible to them compared to acquisition cost), problem solved.
The same math applies for on-prem/home users. If you actually have some workload where it makes sense to get a free GPU that costs $40/hour to power because you only need it for a few hours a month, it's probably cheaper to rent a more efficient GPU from someone who can power it at a lower cost.
Oh my reasoning was coming at this from a different angle: H200s were released in November of 2023, so they're over 2 years old at this point while still being valuable
It mainly depends on how much NVIDIA is overselling the improvements.
With adding RL functions, separating prefill and decode chips, nvfp4 and lots of other architectural changes efficiency of the most valuable tasks goes up as long as the algorithms don't change significantly.
The people who think that high end GPUs don't last for ~5 years (really it's 6) do not know what they are talking about. 6 years is likely too low. If the cooling is good and they don't fail early, most of these GPUs could still keep going past 6 years.
I'm convinced that anything with more than 80gb of VRAM will be worth it for closer to 10 years at this point.
> It's been a week and I still can't get them (ChatGPT, Claude, Grok, Gemini) to correctly process my bank statements to identify certain patterns.
Can you give any more details on what you mean? This feels like a task they should be great at, even if you're not paying the $20/mo for any lab's higher tier model
I have a couple banks that are peculiar in the way they handle transactions made in a different currency while traveling etc. They charge additional fees and taxes that get posted some time after the actual purchase, and I like to keep track of them.
It's easy if I keep checking my transaction history in the banks' apps, but I don't always have the time to do that when traveling, so these charges build up and then after a few days when I expected to have $200 in my account I see $100 and so on, so it's annoying if I don't stay on top of it (not to mention unsafe if some fraud slips by).
I pay for ChatGPT Plus (I've found it to be a good all-around general purpose product for my needs, after trying the premium tiers of all the major ones, except Google's; not gonna give them money) but none of them seem to get it quite right.
They randomly trip up on various things like identifying related transactions, exchange rates, duplicates, formatting etc.
> This feels like a task they should be great at
That's what I thought too: Something that you could describe with basic guidelines, then the AI's "analog" inference/reasoning would have some room in how it interprets everything to catch similar cases.
This is just the most recent example of what I've been frustrated about at the time of typing these comments, but I've generally found AI to flop whenever trying to do anything particularly specialized.
If you installed Claude Code and put all your statements into a local folder and asked it to process them it could do literally anything you could come up with all the way up to setting up an AWS instance with a website that gives nifty visualizations of your spending. Or anything else you are thinking of.
I may try that, but at this point it's already more work wrestling with the AI than just doing it myself.
The most important factor is confidence: After seeing them get some things mixed up a few times, I would have to manually verify the output myself anyway.
----
Re: the multiple comments that suggest to ask AI for code instead of feeding data to the chatbot:
I get what you mean, but I WANT the AI's non-deterministic AIness in this case!
For example, in some countries there are these "omni apps" that can be used for ride hailing or ordering food etc. The bank statement lists all such transactions with the same merchant name. I want the AI to do its AI thing to guess which transactions were rides and which were food deliveries, based on the prices and times etc. Like if there are multiple small transactions those are taxis, and the most expensive transactions during a day are my lunch and dinner.
And there are other cases, that would be too much "imperative" code that would fail anyway.
Like I said, this is a task that any human could do easily after a short explanation, but takes a hell of a lot of wrangling with AI.
I had the same vague impression as you did when using AI via browser/chat interaction. Like it’s very impressive but how useful is it really?
Using it via the CLI approach as an entirely different experience. It’s literally shocking what you can do.
For context, among many other things I have done this exact thing I am recommending. I just hit export on a Quickbooks instance of a complex multimillion dollar business and had Claude Code generate reports on various things I wanted to optimize and it just handles it in seconds.
The real limit to these tools is knowing what to ask for and stating the requirements clearly and incrementally. Once you get the hang of it, it’s literally shocking how many use cases you can find.
I think a good mental model of what you can expect from a chat bot is imagining that somebody read tje bank statement to you and them asked you a bunch of questions. Could you follow that, not make smy mistakes, not forget anything? Cam you perform the task "from the top of your head", not writing anything down, not pulling up excell or a calculator? If you can there's a good chance AI will be able to do that too. The fact that it sometimes can do more is pure miracle. And if you want it to do those things consistently you need to provide it with access to the tools you'd need to perform thus task consistently.
Go row by row. See a certain phrase in the transaction description? Look a few rows ahead. Spot associated fees with just a glance. Write that group of transactions down somewhere else.
That's it.
I tried different kinds of prompts, from imperative to declarative, including telling the AI to write a script for its own internal use, but they just don't seem to get it.
AI has purely linear input channel. It gets tokens one by one. Context is a form of short term memory. I know, that because you give it written text it seems like you provide it with a document it should be able to process in any way it likes, but the system is set up as if you read the document to AI, word by word and asked questions about it, that it needs to answer "of the top of its head".
> It's simple, I can do it myself:
> Go row by row. See a certain phrase in the transaction description? Look a few rows ahead.
Can you do it without looking at the document? Just by ear? Every time correctly? Without missing something?
This is exactly why you have it write code instead of analyzing the data. You can have tests, you can inspect then code, you know that the process will be deterministic. The chatbot LLMs are a bad match for bulk data analysis on regular, structured data. But they're often quite decent at writing code.
> Like I said, this is a task that any human could do easily after a short explanation, but takes a hell of a lot of wrangling with AI.
Replying to your edit. It just doesn’t. It’s almost effortless and fast to do exactly what you’re describing, capturing the subjective judgement of AI, to do what you want.
It took me a couple weeks to get very very good at it with good results in the first day or two. If you’re a competent programmer you’ll have the same experience and quickly if you get into the flow that’s being described to you.
I’m the ultimate skeptic I understand where you’re coming from but these workflows are crazy powerful.
This is the right answer. Don't just feed the data to a chatbot; have it write code to do what you want, repeatably and testably. You can probably have working python (and a docker container for it) in under 30 min.
I don't think the commentor above is saying that an AI should necessarily apply the redaction. Rather, an AI can serve as an objective-ish way of determining what should be redacted. This seems somewhat analogous to how (non-AI) models can we used to evaluate how gerrymandered a map is
Using voice transcription is nice for fully expressing what you want, so the model doesn't need to make guesses. I'm often voicing 500-word prompts. If you talk in a winding way that looks awkward when in text, that's fine. The model will almost certainly be able to tell what you mean. Using voice-to-text is my biggest suggestion for people who want to use AI for programming
(I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable. In general, I think just this lowered friction makes me much more willing to fully describe what I want)
You can also ask it, "do you have any questions?" I find that saying "if you have any questions, ask me, otherwise go ahead and build this" rarely produces questions for me. However, if I say "Make a plan and ask me any questions you may have" then it usually has a few questions
I've also found a lot of success when I tell Claude Code to emulate on some specific piece of code I've previously written, either within the same project or something I've pasted in
> I'm not a particularly slow typer. I can go 70-90 WPM on a typing test. However, this speed drops quickly when I need to also think about what I'm saying. Typing that fast is also kinda tiring, whereas talking/thinking at 100-120 WPM feels comfortable.
This doesn't feel relatable at all to me. If my writing speed is bottlenecked by thinking about what I'm writing, and my talking speed is significantly faster, that just means I've removed the bottleneck by not thinking about what I'm saying.
It's often better to segregate creative and inhibitive systems even if you need the inhibitive systems to produce a finished work. There's a (probably apocryphal) conversation between George RR Martin and Stephen King that goes something like:
GRRM: How do you write so many books?... Don't you ever spend hours staring at the page, agonizing over which of two words to use, and asking 'am I actually any good at this?'
That's fair. I sometimes find myself pausing or just talking in circles as I'm deciding what I want. I think when I'm speaking, I feel freer to use less precise/formal descriptions, but the model can still correctly interpret the technical meaning
In either case, different strokes for different folks, and what ultimately matters is whether you get good results. I think the upside is high, so I broadly suggest people try it out
Alternatively: some people are just better at / more comfortable thinking in auditory mode than visual mode & vice versa.
In principle I don't see why they should have different amounts of thought. That'd be bounded by how much time it takes to produce the message, I think. Typing permits backtracking via editing, but speaking permits 'semantic backtracking' which isn't equivalent but definitely can do similar things. Language is powerful.
And importantly, to backtrack in visual media I tend to need to re-saccade through the text with physical eye motions, whereas with audio my brain just has an internal buffer I know at the speed of thought.
Typed messages might have higher _density_ of thought per token, though how valuable is that really, in LLM contexts? There are diminishing returns on how perfect you can get a prompt.
Also, audio permits a higher bandwidth mode: one can scan and speak at the same time.
It's kind of the point. If you start writing it, you'll start correcting it and moving things around and adding context and fiddling and more and more.
And your 5 minute prompt just turned I to 1/2 hour of typing
With voice you get on with it, and then start iterating, getting Claude to plan with you.
Not been impressed with agentic coding myself so far, but I did notice that using voice works a lot better imo, keeping me focused on getting on with letting the agent do the work.
I've also found it good for stopping me doing the same thing in slack messages. I ramble my general essay to ChatGPT/Claude, get them to summarize rewrite a few lines in my own voice. Stops me spending an hour crafting a slack message and tends to soften it.
I prefer writing myself, but I could see the appeal of producing a first draft of a prompt by dumping a verbal stream of consciousness into ChatGPT. That might actually be kind of fun to try while going on a walk or something.
That's definitely cool too. I was just suggesting an intermediary text prompt step as a compromise between 100% writing and 100% voice. So instead of getting home to actual code, you'd get home to a draft of relatively detailed requirements to review and revise before incurring the cost of throwing a coding agent at it.
I don’t feel restricted by my typing speed, speaking is just so much easier and convenient. The vast majority of my ChatGPT usage is on my phone and that makes s2t a no brainer.
100% this, I built laboratory.love almost entirely with my voice and (now-outdated) Claude models
My go-to prompt finisher, which I have mapped to a hotkey due to frequent use, is "Before writing any code, first analyze the problem and requirements and identify any ambiguities, contradictions, or issues. Ask me to clarify any questions you have, and then we'll proceed to writing the code"
It's an AI. You might do better by phrasing it, 'Make a plan, and have questions'. There's nobody there, but if it's specifically directed to 'have questions' you might find they are good questions! Why are you asking, if you figure it'd be better to get questions? Just say to have questions, and it will.
It's like a reasoning model. Don't ask, prompt 'and here is where you come up with apropos questions' and you shall have them, possibly even in a useful way.
> surprised ai companies are not making this workflow possible instead of leaving it upto users to figure out how to get voice text into prompt.
Claude on macOS and iOS have native voice to text transcription. Haven't tried it but since you can access Claude Code from the apps now, I wonder if you use the Claude app's transcription for input into Claude Code.
> Claude on macOS and iOS have native voice to text transcription
Yeah, Claude/ChatGPT/Gemini all offer this, although Gemini's is basically unusable because it will immediately send the message if you stop talking for a few seconds
I imagine you totally could use the app transcript and paste it in, but keeping the friction to an absolute minimum (e.g., just needing to press one hotkey) feels nice
Love handy. I use it too when dealing with LLMs. The other day I asked chatgpt to generate interview questions based on job description and then I answered using handy. So cool!
I use Spokenly with local Parakeet 0.6B v3 model + Cerebras gpt-oss-120b for post-processing (cleaning up transcription errors and fixing technical mondegreens, e.g., `no JS` → `Node.js`). Almost imperceptible transcription and processing delay. Trigger transcription with right ⌥ key.
I use the Raycast + Whisper Dictation. I don't think there is anything novel about it, but it integrates nicely into my workflow.
My main gripe is when the recording window loses focus, I haven't found a way to bring it back and continue the recorded session. So occasionally I have to start from scratch, which is particularly annoying if it happens during a long-winded brain dump.
I built my own open-source tool to do exactly this so that I can run something like `claude $(hns)` in my terminal and then I can start speaking, and after I'm done, claude receives the transcript and start working. See this workflow here: https://hns-cli.dev/docs/drive-coding-agents/
There are a few apps nowadays for voice transcription. I've used Wispr Flow and Superwhisper, and both seem good. You can map some hotkey (e.g., ctrl + windows) to start recording, then when you press it again to stop, it'll get pasted into whatever text box you have open
Superwhisper offers some AI post-processing of the text (e.g., making nice bullets or grammar), but this doesn't seem necessary and just makes things a bit slower
I do the same. On Mac I use macwhisper. The transcription does not have to be correct. Lots of times it writes the wrong word when talking about technical stuff but Claude understands which word I mean from context
My regular workflow is to talk (I use VoiceInk for transcription) and then say “tell me what you understood” — this puts your words into a well structured format, and you can also make sure the cli-agent got it, and expressing it explicitly likely also helps it stay on track.
I use a keyboard shortcut to start and stop recording and it will put the transcription into the clipboard so I can paste into any app.
It's a huge productivity boost - OP is correct about not overthinking trying to be that coherent - the models are very good at knowing what you mean (Opus 4.5 with Claude Code in my case)
I just installed this app and it is very nice. The UX is very clean and whatever I say it transcribes it correctly. In fact I'm transcribing this comment with this app just now.
I am using Whisper Medium. The only problem I see is that at the end of the message it sometimes puts a bye or a thank you which is kind of annoying.
I am all ready to believe that with LLMs it's not worth it trying to be too coherent: I did successfully use LLMs to make sense of what incoherent-sounding people say. (in text)
Aquavoice, YC company, really good. Got it after doing a bit of research on here, there's something for Mac that's supposed to be good too.
If you want local transcription, locally running models aren't quite good enough yet.
They use right-ctrl as their trigger. I've set mine to double tap and then I can talk with long pauses/thinking and it just keeps listening till I tap to finish.
I'm using Wispr flow, but I've also tried Superwhisper. Both are fine. I have a convenient hotkey to start/end recording with one hand. Having it just need one hand is nice. I'm using this with the Claude Code vscode extension in Cursor. If you go down this route, the Claude Code instance should be moved into a separate window outside your main editor or else it'll flicker a lot
another option is MacWhisper if someone is on macOS and doesn't want to pay for subscription (just one time payment) - pretty much all of those apps these days use paraspeech from NVIDIA which is the fastest and the best open source model that can run on edge devices.
Also haven't tried but on latest MacOS 26 apple updated their STT models so their build in voice dictation maybe is good enough.
voice transcription is silly when someone is listening you talking to something that isn't exactly human, imagine explaining you were talking to AI. When it's more than one sentence I use voice too.
The LLM still benefits from the abstraction provided by Python (fewer tokens and less cognitive load). I could see a pipeline working where one model writes in Python or so, then another model is tasked to compile it into a more performant language