Hacker Newsnew | past | comments | ask | show | jobs | submit | t1amat's commentslogin

Interesting idea, but I think it might have made more sense to use something like repomix to generate the source bundle and tiktoken’d that. Practically speaking you don’t send many source files in raw text form, either they have some sort of file wrapper with metadata or are pulled in from a tool call where the tool call arguments act as the metadata.

When most people refer to “GLM” they refer to the mainline model. The difference in scale between GLM 5 and GLM 4.7 Flash is enormous: one runs on acceptably on a phone, the other on $100k+ hardware minimum. While GLM 4.7 Flash is a gift to the local LLM crowd, it is nowhere near as capable as its bigger sibling in use cases beyond typical chat.


Perhaps the opposite: a language small enough that its entirety can easily be stuffed in context.


Not a direct answer but it looks like v0.5 is a nanoGPT arch and v1 is a Phi 1.5 arch, which should be well supported by quanting utilities for any engine. They are small too and should be able to be done on a potato.


With M2, yes - I’ve used it in Claude Code (e.g. native tool calling), Roo/Cline (e.g. custom tool parsing), etc. It’s quite good and for some time the best model to self-host. At 4bit it can fit on 2x RTX 6000 Pro (e.g. ~200GB VRAM) with about 400k context at fp8 kv cache. It’s very fast due to low active params, stable at long context, quite capable in any agent harness (its training specialty). M2.1 should be a good bump beyond M2, which was undertrained relative to even much smaller models.


You might have 1A rights as an American but it seems to me the manner in which this person protested would be grounds for termination in many jurisdictions.


1A doesn't apply to private entities anyway. 1A protects against government prosecution for your speech, and the government may make no laws "abridging the freedom of speech."

But your employer? They can put whatever rules and restrictions they want on your speech, and with at-will employment, can fire you for any reason anyway, at anytime.

You can say whatever you want, but you aren't free from the consequences of that speech.


This comment sums up well how the spirit of the law is not being upheld, given that the biggest players in government, finance, and the corporate world are working together hand in glove.

>”Corporations cannot exist without government intervention”

>”Some privates companies and financiers are too big to fail/of strategic national importance”

>”1A does not apply to private entities (including the above)”

>”We have a free, competitive market”

I find it very difficult to resolve these seemingly contradictory statements.


Literally nothing to do with 1A


That's because 1A only has to do with a limited subset of the actual concept of freedom of speech.


This is the right take. You might be able to get decent (2-3x less than a GPU rig) token generation, which is adequate, but your prompt processing speeds are more like 50-100x slower. A hardware solution is needed to make long context actually usable on a Mac.


The problem with OpenAI models is the lack of a Max-like subscription for a good agentic harness. Maybe OpenAI or Microsoft could fix this.

I just went through the agony of provisioning my team with new Claude Code 5x subs 2 weeks ago after reviewing all of the options available at that time. Since then, the major changes include a Cerebras sub for Qwen3 Coder 480B, and now GPT-5. I’m still not sure I made the right choice, but hey, I’m not married to it either.

If you plan on using this much at all then the primary thing to avoid is API-based pay per use. It’s prohibitively costly to use regularly. And even for less important changes it never feels appropriate to use a lower quality model when the product counts.

Claude Code won primarily because of the sub and that they have a top tier agentic harness and models that know how to use it. Opus and Sonnet are fantastic agents and very good at our use case, and were our preferred API-based models anyways. We can use Claude Code basically all day with at least Sonnet after using our Opus limits up. Worth nothing that Cline built a Claude Code provider that the derivatives aped which is great but I’ve found Claude Code to be as good or better anyways. The CLI interface is actually a bonus for ease of sharing state via copy/paste.

I’ll probably change over to Gemini Code Assist next, as it’s half the price and more context length, but I’m waiting for a better Gemini 2.5 Pro and the gemini-cli/Code Assist extensions to have first party planning support, which you can get some form of third party through custom extensions with the cli, but as an agent harness they are incomplete without.

The Cerebras + Qwen3 Coder 480B with qwen3-cli is seriously tempting. Crazy generation speed. Theres some question about how long big the rate limit really is but it’s half the cost of Claude Code 5x. I haven’t checked but I know qwen3-cli, which was introduced along side the model, is a fork of gemini-cli with Qwen-focused updates; wonder if they landed a planning tool?

I don’t really consider Cursor, Windsurf, Cline, Roo, Kilo et al as they can’t provide a flat rate service with the kind of rate limits you can get with the aforementioned.

GitHub Copilot could be a great offering if they were willing to really compete with a good unlimited premium plan but so far their best offering has less premium requests than I make in a week, possibly even in a few days.

Would love to hear if I missed anything, or somehow missed some dynamic here worth considering. But as far as I can tell, given heavy use, you only have 3 options today: Claude Max, Gemini Code Assist, Cerebras Code.


> If you plan on using this much at all then the primary thing to avoid is API-based pay per use.

I find there's a niche where API pay-per-use is cost effective. It's for problems that require (i) small context and (ii) not much reasoning.

Coding problems with 100k-200k context violates (i). Math problems violate (ii) because they generate long reasoning streams.

Coding problems with 10k-20k context are well suited, because they generate only ~5k output tokens. That's $0.03-$0.04 per prompt to GPT-5 under flex pricing. The convenience is worth it, unless you're relying on a particular agentic harness that you don't control (I am not).

For large context questions, I send them to a chat subscription, which gives me a budget of N prompts instead of N tokens. So naturally, all the 100k-400k token questions go there.


OpenAI has answered your prayers.

16 hours ago the readme for codex CLI was updated. Now codex cli supports openai login like claude does, no API credits.

From the readme:

After you run codex select Sign in with ChatGPT. You'll need a Plus, Pro, or Team ChatGPT account, and will get access to our latest models, including gpt-5, at no extra cost to your plan. (Enterprise is coming soon.)

Important: If you've used the Codex CLI before, you'll need to follow these steps to migrate from usage-based billing with your API key:

Update the CLI with codex update and ensure codex --version is greater than 0.13 Ensure that there is no OPENAI_API_KEY environment variable set. (Check that env | grep 'OPENAI_API_KEY' returns empty) Run codex login again


Oh that’s fantastic news, thanks!


Is this actually true? Last I checked (a week ago?) Codex the agents were free at some tiers in a preview capacity (with future rate limits based on tier), but codex cli was not. With codex cli you can log in but the purpose of that is to link it to an API key where you pay per use. The sub tiers give one time credits you would burn through quickly.


Found this in the GPT-5 Announcement:

> Availability and access > GPT‑5 is starting to roll out today to all Plus, Pro, Team, and Free users, with access for Enterprise and Edu coming in one week. Pro, Plus, and Team users can also start coding with GPT‑5 in the Codex CLI (opens in a new window) by signing in with ChatGPT.


I doubt this is true anymore, if ever. Both require string escaping, which is the real hurdle. And they are heavily trained on JSON for tool calling.


I believe it could be true because I think training dataset contained a lot more yaml than json. I mean...you know how much yaml get churned out every second?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: