Update, Haiku 4.5 is not just very targeted in terms of changes but also really ...

qingcharles · 2025-10-16T02:14:03 1760580843

It's insanely fast. I didn't know it had even been released, but I went to select the copilot SWE test model in VSCode and it was missing and Haiku 4.5 was there instead. I asked for a huge change to a web app and the output from Haiku scrolled the text faster than Windows could keep up. From a cold start. Wrote a huge chunk of code in about 40 seconds. Unreal.

p.s. it also got the code 100% correct on the one-shot p.p.s. Microsoft are pricing it out at 30% the cost of frontier models (e.g. Sonnet 4.5, GPT5)

katchu11 · 2025-10-15T23:39:52 1760571592

Hey! I work on the Claude Code team. Both PAYG and Subscription usage look to be configured correctly in accordance with the price for Haiku 4.5 ($1/$5 per M I/O tok).

Feel free to DM me your account info on twitter (https://x.com/katchu11) and I can dig deeper!

peddling-brink · 2025-10-16T00:04:31 1760573071

lol, I don’t know if you work there or not, but directing folks to send their account info to a random Twitter address is, not considered best practice.

ethbr1 · 2025-10-16T03:31:05 1760585465

Being charitable, let's assume parent wasn't talking about secrets.

squigz · 2025-10-16T06:08:36 1760594916

What's wrong with sending a username to someone?

lukeck · 2025-10-16T22:04:01 1760652241

Generally, nothing inherently wrong with sending a username but directing people to a 3rd party social media platform rather than an official Anthropic email or support system does nothing to build trust that they actually work there.

rat9988 · 2025-10-16T11:34:29 1760614469

What best practice. He can choose whether he sends or not. The guy is just offering some extra help here.

Topfi · 2025-10-18T11:43:23 1760787803

Thanks, sorry, only saw the offer now. Have just checked and cannot reproduce the usage any more, might have been mistaken on that front.

rbitar · 2025-10-15T21:07:29 1760562449

Where do you get the 220 token/second? Genuinely curious as that would be very impressive for a model comparable to sonnet 4. OpenRouter currently publishing around 116/tps[1]

[1] https://openrouter.ai/anthropic/claude-haiku-4.5

Topfi · 2025-10-15T21:38:02 1760564282

Was just about to post that Haiku 4.5 does something I have never encountered before [0], there is a massive delta between token/sec depending on the query. Some variance including task specific is of course nothing new, but never as pronounced and reproducible as here.

A few examples, prompted at UTC 21:30-23:00 via T3 Chat [0]:

Prompt 1 — 120.65 token/sec — https://t3.chat/share/tgqp1dr0la

Prompt 2 — 118.58 token/sec — https://t3.chat/share/86d93w093a

Prompt 3 — 203.20 token/sec — https://t3.chat/share/h39nct9fp5

Prompt 4 — 91.43 token/sec — https://t3.chat/share/mqu1edzffq

Prompt 5 — 167.66 token/sec — https://t3.chat/share/gingktrf2m

Prompt 6 — 161.51 token/sec — https://t3.chat/share/qg6uxkdgy0

Prompt 7 — 168.11 token/sec — https://t3.chat/share/qiutu67ebc

Prompt 8 — 203.68 token/sec — https://t3.chat/share/zziplhpw0d

Prompt 9 — 102.86 token/sec — https://t3.chat/share/s3hldh5nxs

Prompt 10 — 174.66 token/sec — https://t3.chat/share/dyyfyc458m

Prompt 11 — 199.07 token/sec — https://t3.chat/share/7t29sx87cd

Prompt 12 — 82.13 token/sec — https://t3.chat/share/5ati3nvvdx

Prompt 13 — 94.96 token/sec — https://t3.chat/share/q3ig7k117z

Prompt 14 — 190.02 token/sec — https://t3.chat/share/hp5kjeujy7

Prompt 15 — 190.16 token/sec — https://t3.chat/share/77vs6yxcfa

Prompt 16 — 92.45 token/sec — https://t3.chat/share/i0qrsvp29i

Prompt 17 — 190.26 token/sec — https://t3.chat/share/berx0aq3qo

Prompt 18 — 187.31 token/sec — https://t3.chat/share/0wyuk0zzfc

Prompt 19 — 204.31 token/sec — https://t3.chat/share/6vuawveaqu

Prompt 20 — 135.55 token/sec — https://t3.chat/share/b0a11i4gfq

Prompt 21 — 208.97 token/sec — https://t3.chat/share/al54aha9zk

Prompt 22 — 188.07 token/sec — https://t3.chat/share/wu3k8q67qc

Prompt 23 — 198.17 token/sec — https://t3.chat/share/0bt1qrynve

Prompt 24 — 196.25 token/sec — https://t3.chat/share/nhnmp0hlc5

Prompt 25 — 185.09 token/sec — https://t3.chat/share/ifh6j4d8t5

I ran each prompt three times and got (within expected variance meaning less than 5% plus or minus) the same token/sec results for the respective prompt. Each used Claude Haiku 4.5 with "High reasoning". Will continue testing, but this is beyond odd. I will add that my very early evals leaned heavily into pure code output, where 200 token/sec is consistently possible at the moment, but it is certainly not the average as claimed before, there I was mistaken. That being said, even across a wider range of challenges, we are above 160 token/sec and if you solely focus on coding, whether Rust or React, Haiku 4.5 is very swift.

[0] Normally not using T3 Chat for evals, just easier to share prompts this way, though was disappointed to find that the model information (token/sec, TTF, etc.) can't be enabled without an account. Also, these aren't the prompts I usually use for evals. Those I try to keep somewhat out of training by only using paid for API for benchmarks. As anything on Hacker News is most assuredly part of model training, I decided to write some quick and dirty prompts to highlight what I have been seeing.

rbitar · 2025-10-16T00:17:42 1760573862

Interesting and if they are using speculative decoding that variance would make sense. Also your numbers line up with what openrouter is now publishing at 169.1tps [1]

Anthropic mentioned this model is more then twice as fast as claude sonnet 4 [2], which OpenRouter averaged at 61.72 tps for sonnet 4 [3]. If these numbers hold we're really looking at an almost 3x improvement in throughput and less then half the initial latency.

[1] https://openrouter.ai/anthropic/claude-haiku-4.5 [2] https://www.anthropic.com/news/claude-haiku-4-5 [3] https://openrouter.ai/anthropic/claude-sonnet-4

cromulen · 2025-10-15T21:49:00 1760564940

That's what you get when you use speculative decoding and focus / overfit the draft model on coding. Then when the answer is out of distribution for the draft model, you get increased token rejections by the main model and throughput suffers. This probably still makes sense for them if they expect a lot of their load will come from claude code and they need to make it economical.

abhgh · 2025-10-16T04:20:48 1760588448

I'm curious to know if Anthropic mentions anywhere that they use speculative decoding. For OpenAI they do seem to use it based on this tweet [1].

[1] https://x.com/stevendcoffey/status/1853582548225683814