Qwen team, please please please, release something to outperform and surpass the...

mortsnort · 2026-01-22T16:26:54 1769099214

They were just waiting for someone in the comments to ask!

zeppelin101 · 2026-01-22T21:31:49 1769117509

Someone has to take the first step. Let's be grateful to the brave anon HN poster for stepping up.

mhuffman · 2026-01-22T17:15:08 1769102108

It really is the best way to incentivize politeness!

stuckkeys · 2026-01-22T20:26:34 1769113594

I loled hard at this. Thank you kind stranger.

pseudony · 2026-01-22T17:43:34 1769103814

Same issue (I am Danish).

Have you tested alternatives? I grabbed Open Code and a Minimax m2.1 subscription, even just the 10usd/mo one to test with.

Result? We designed a spec for a slight variation of a tool for which I wrote a spec with Claude - same problem (process supervisor tool), from scratch.

Honestly, it worked great, I have played a little further with generating code (this time golang), again, I am happy.

Beyond that, Glm4.7 should also be great.

See https://dev.to/kilocode/open-weight-models-are-getting-serio...

It is a recent case story of vibing a smaller tool with kilo code, comparing output from minimax m2.1 and Glm4.7

Honestly, just give it a whirl - no need to send money to companies/nations your disagree with with.

nunodonato · 2026-01-22T18:14:42 1769105682

I've been using GLM 4.7 with Claude Code. best of both worlds. Canceled my Anthropic subscription due to the US politics as well. Already started my "withdrawal" in Jan 2025, Anthropic was one of the few that was left

bigyabai · 2026-01-22T18:17:59 1769105879

I'm in the same boat. Sonnet was overkill for me, and GLM is cheap and smart enough to spit out boilerplate and FFMPEG commands whenever it's asked.

$20/month is a bit of an insane ask when the most valuable thing Anthropic makes is the free Claude Code CLI.

mikenew · 2026-01-23T00:54:04 1769129644

I've recently switched to OpenCode and found it to be far better. Plus GML 4.7 is free at the moment, so for now it's a great no-cost setup.

stavros · 2026-01-22T20:42:43 1769114563

I don't know, I max out my Opus limits regularly. I guess it depends on usage.

Mashimo · 2026-01-23T09:03:06 1769158986

> I'm in the same boat. Sonnet was overkill for me, and GLM is cheap and smart enough to spit out boilerplate and FFMPEG commands whenever it's asked.

Do you even need an subscription to any service for that? Is a free tier not enough?

dsrtslnd23 · 2026-01-22T23:20:48 1769124048

Are you using an API proxy to route GLM into the Claude Code CLI? Or do you mean side-by-side usage? Not sure if custom endpoints are supported natively yet.

sumedh · 2026-01-23T07:58:20 1769155100

This works: $ZAI_ANTHROPIC_BASE_URL=xxx $ZAI_ANTHROPIC_AUTH_TOKEN=xxx

  alias "claude-zai"="ANTHROPIC_BASE_URL=$ZAI_ANTHROPIC_BASE_URL ANTHROPIC_AUTH_TOKEN=$ZAI_ANTHROPIC_AUTH_TOKEN claude"

Then you can run `claude`, hit your limit, exit the session and `claude-zai -c` to continue (with context reset, of course).

Someone gave me that command a while back.

nunodonato · 2026-01-24T12:24:02 1769257442

thats pretty much what I do, I have a bash alias to launch either the normal claude code, or the glm one

Mashimo · 2026-01-23T08:45:28 1769157928

This is the official guide: https://docs.z.ai/devpack/tool/claude

stavros · 2026-01-22T20:42:16 1769114536

I much prefer OpenCode these days, give it a try.

nunodonato · 2026-01-22T21:45:11 1769118311

I did, I couldnt get used to it and didn't get so good results. I think Claude Code's tools are really top notch, and maybe the system prompt

TylerLives · 2026-01-22T16:19:05 1769098745

>how divisive they're in terms of politics

What do you mean by this?

throwaw12 · 2026-01-22T16:23:38 1769099018

Dario said not nice words about China and open models in general:

https://www.bloomberg.com/news/articles/2026-01-20/anthropic...

vlovich123 · 2026-01-22T16:48:18 1769100498

I think the least politically divisive issue within the US is concern about China’s growth as it directly threatens the US’s ability to set the world’s agenda. It may be politically divisive if you are aligned with Chinese interests but I don’t see anything politically divisive for a US audience. I expect Chinese CEOs speak in similar terms to a Chinese audience in terms of making sure they’re decoupled from the now unstable US political machine.

cmrdporcupine · 2026-01-22T17:40:23 1769103623

"... for a US audience"

And that's the rub.

Many of us are not.

subscribed · 2026-01-22T20:14:07 1769112847

Looking at the last year's US agenda I'm okay with that.

Levitz · 2026-01-22T17:31:59 1769103119

I mean, there's no way it's about this right?

Being critical of favorable actions towards a rival country shouldn't be divisive, and if it is, well, I don't think the problem is in the criticism.

Also the link doesn't mention open source? From a google search, he doesn't seem to care much for it.

giancarlostoro · 2026-01-22T17:34:35 1769103275

From the perspective of competing against China in terms of AI the argument against open models makes sense to me. It’s a terrible problem to have really. Ideally we should all be able to work together in the sandbox towards a better tomorrow but thats not reality.

I prefer to have more open models. On the other hand China closes up their open models once they start to show a competitive edge.

Balinares · 2026-01-22T17:40:57 1769103657

They're supporters of the Trump administration's military, a view which is not universally lauded.

mohsen1 · 2026-01-22T19:15:50 1769109350

With a good harness I am getting similar results with GLM 4.7. I am paying for TWO! max accounts and my agents are running 24/7.

I still have a small Claude account to do some code reviews. Opus 4.5 does good reviews but at this point GLM 4.7 usually can do the same code reviews.

If cost is an issue (for me it is, I pay out of pocket) go with GLM 4.7

imiric · 2026-01-22T23:19:26 1769123966

Your GitHub profile is... disturbing. 1,354 commits and 464 pull requests in January so far.

Regardless of how productive those numbers may seem, that amount of code being published so quickly is concerning, to say the least. It couldn't have possibly been reviewed by a human or properly tested.

If this is the future of software development, society is cooked.

mohsen1 · 2026-01-23T05:32:25 1769146345

It's mostly trying out my orchestration system (https://github.com/mohsen1/claude-code-orchestrator and https://github.com/mohsen1/claude-orchestrator-action) in a repo using GH_PAT.

Stuff like this: https://github.com/mohsen1/claude-code-orchestrator-e2e-test...

Yes, the idea is to really, fully automate software engineering. I don't know if I am going to be successful but I'm on vacation and having fun!

if Opus 4.5/GLM 4.7 can do so much already, I can only imagine what can be done in two years. Might as well adopt to this reality and learn how leverage this advancement

azuanrb · 2026-01-23T15:26:13 1769181973

On the contrary, that actually is pretty cool. z.ai subscription is cheap enough that I'm thinking to run it 24/7 too. Curious if you've tried any other AI orchestration tools like Gas Town? What made you decide to build your own, and how is it working for you so far?

mohsen1 · 2026-01-23T19:44:55 1769197495

I didn't know about Gas Town! Super cool! I will try it once I have a chance. I started with a few dumb Tmux based scripts and eventually I figured I make it into a proper package.

I think using GitHub with issues,PRs and specially leveraging AI code reviewers like Greptile is the way to go Actually. I did an attempt here https://github.com/mohsen1/claude-orchestrator-action but I think it needs a lot more attention to get it right. Ideas in Gas Town are great and I might steal some of those. Running Claude Code in GitHub Action works with GLM 4.7 great.

Microsoft's new Agent SDK is also interesting. Unlocks multi-provider workflows so user can burn out all of their subscriptions or quickly switch providers

Also super interested in collaborating with someone to build something together if you are interested!

gcr · 2026-01-22T23:48:48 1769125728

You may not like it but this is what a 10x developer looks like. :-)

genewitch · 2026-01-23T03:34:57 1769139297

you may enjoy spaghetti, but will you enjoy 10x spaghetti?

amrrs · 2026-01-22T16:14:10 1769098450

Have you tried the new GLM 4.7?

davely · 2026-01-22T17:12:08 1769101928

I've been using GLM 4.7 alongside Opus 4.5 and I can't believe how bad it is. Seriously.

I spent 20 minutes yesterday trying to get GLM 4.7 to understand that a simple modal on a web page (vanilla JS and HTML!) wasn't displaying when a certain button was clicked. I hooked it up to Chrome MCP in Open Code as well.

It constantly told me that it fixed the problem. In frustration, I opened Claude Code and just typed "Why won't the button with ID 'edit' work???!"

It fixed the problem in one shot. This isn't even a hard problem (and I could have just fixed it myself but I guess sunk cost fallacy).

bityard · 2026-01-22T17:29:26 1769102966

I've used a bunch of the SOTA models (via my work's Windsurf subscription) for HTML/CSS/JS stuff over the past few months. Mind you, I am not a web developer, these are just internal and personal projects.

My experience is that all of the models seem to do a decent job of writing a whole application from scratch, up to a certain point of complexity. But as soon as you ask them for non-trivial modifications and bugfixes, they _usually_ go deep into rationalized rabbit holes into nowhere.

I burned through a lot of credits to try them all and Gemini tended to work the best for the things I was doing. But as always, YMMV.

KolmogorovComp · 2026-01-22T17:40:03 1769103603

Exactly the same feedback

Balinares · 2026-01-22T18:26:40 1769106400

Amazingly, just yesterday, I had Opus 4.5 crap itself extensively on a fairly simple problem -- it was trying to override a column with an aggregation function while also using it in a group-by without referring to the original column by its full qualified name prefixed with the table -- and in typical Claude fashion it assembled an entire abstraction layer to try and hide the problem under, before finally giving up, deleting the column, and smugly informing me I didn't need it anyway.

That evening, for kicks, I brought the problem to GLM 4.7 Flash (Flash!) and it one-shot the right solution.

It's not apples to apples, because when it comes down to it LLMs are statistical token extruders, and it's a lot easier to extrude the likely tokens from an isolated query than from a whole workspace that's already been messed up somewhat by said LLM. That, and data is not the plural of anecdote. But still, I'm easily amused, and this amused me. (I haven't otherwise pushed GLM 4.7 much and I don't have a strong opinion about about it.)

But seriously, given the consistent pattern of knitting ever larger carpets to sweep errors under that Claude seems to exhibit over and over instead of identifying and addressing root causes, I'm curious what the codebases of people who use it a lot look like.

girvo · 2026-01-22T21:57:59 1769119079

> I can't believe how bad it is

This has been my consistent experience with every model prior to Opus 4.5, and every single open model I've given a go.

Hopefully we will get there in another 6 months when Opus is distilled into new open models, but I've always been shocked at some of the claims around open models, when I've been entirely unable to replicate them.

Hell, even Opus 4.5 shits the bed with semi-regularity on anything that's not completely greenfield for my usage, once I'm giving it tasks beyond some unseen complexity boundary.

throwaw12 · 2026-01-22T16:24:53 1769099093

yes I did, not on par with Opus 4.5.

I use Opus 4.5 for planning, when I reach my usage limits fallback to GLM 4.7 only for implementing the plan, it still struggles, even though I configure GLM 4.7 as both smaller model and heavier model in claude code

WarmWash · 2026-01-22T16:45:32 1769100332

The Chinese labs distill the SOTA models to boost the performance of theirs. They are a trailer hooked up (with a 3-6 month long chain) to the trucks pushing the technology forwards. I've yet to see a trailer overtake it's truck.

China would need an architectural breakthrough to leap American labs given the huge compute disparity.

miklosz · 2026-01-22T17:17:58 1769102278

I have seen indeed a trailer overtake its truck. Not a beautiful view.

digdugdirk · 2026-01-22T18:23:15 1769106195

Agreed. I do think the metaphor still holds though.

A financial jackknifing of the AI industry seems to be one very plausible outcome as these promises/expectations of the AI companies starts meeting reality.

overfeed · 2026-01-22T18:10:26 1769105426

Care to explain how the volume of AI research papers authored by Chinese researchers[1] has exceeded US-published ones? Time-traveling plagiarism perhaps, since you believe the US is destined to lead always.

1. Chinese researcher in China, to be more specific.

bfeynman · 2026-01-22T18:30:13 1769106613

Not a great metric, research in academia doesn't necessarily translate to value. In the US they've poached so many academics because of how much value they directly translate to.

WarmWash · 2026-01-22T21:56:21 1769118981

I don't doubt China wouldn't be capable of making SOTA models, however they are very heavily compute constrained. So they are forced to shortcut compute by riding the coattails of compute heavy models.

They need a training-multiplier breakthrough that would allow them to train SOTA models on on a fraction of the compute that the US does. And this would also have to be kept a secret and be well hidden (often multiple researchers from around the world put the pieces together on a problem at around the same time, so the breakthrough would have to be something pretty difficult to discover for the greatest minds in the field) to prevent the US from using it to multiply their model strength with their greater compute.

jacquesm · 2026-01-22T18:25:32 1769106332

Volume is easy: they have far more people, it is quality that counts.

numpad0 · 2026-01-23T02:27:54 1769135274

Yeah, and if anything it's US defying massive disadvantage in headcount is what is odd, not the other way around.

  1: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population

overfeed · 2026-01-22T19:45:02 1769111102

Perhaps you should pay attention to where the puck is going to be, rather than where it is currently. Lots of original ideas are coming out of Chinese AI research[1], denying this betrays some level of cope.

1. e.g. select any DeepSeek release, and read the accompanying paper

jacquesm · 2026-01-22T20:30:44 1769113844

I'll pay attention to where the puck is because that is something I can observe, where it is going to be is anybody's guess. Lots of original ideas are coming out of Chinese AI research but there is also lots of junk. I think in the longer term they will have the advantage but right now that simply isn't the case.

Your 'cope' accusation has no place here, I have no dog in the race and do not need to cope with anything.

overfeed · 2026-01-22T22:21:33 1769120493

> Your 'cope' accusation has no place here

I will rephrase my statement and continue to stand by it: "Denying the volume of original AI research being done by China - a falsifiable metric - betrays some level of cope."

You seem to agree on the fact that China has surpassed the US. As for quality, I'll say expertise is a result of execution. At some point in time during off-shoring, the US had qualitatively better machinists that China, despite manufacturing volumes. That is no longer the case today - as they say, cream floats to the top, and that holds true for a pot or an industrial-sized vat.

popalchemist · 2026-01-23T05:13:46 1769145226

It may not be cope, could just be ignorance.

aaa_aaa · 2026-01-22T17:03:21 1769101401

No all they need is time. I am awaiting the dowfall of the ai hegemony and hype with popcorn at hand.

mhuffman · 2026-01-22T17:16:12 1769102172

I would be happy with an openweight 3 month old Claude

cmrdporcupine · 2026-01-22T17:41:21 1769103681

DeepSeek 3.2 is frankly fairly close to that. GLM 4.7 as well. They're basically around Sonnet 4 level.

genewitch · 2026-01-23T03:36:12 1769139372

can you point me at another free voice cloning / tts model with this fidelity and, i guess prompt adherence?

because i've been on youtube and insta, and believe me, no one else even compares, yet.

Onavo · 2026-01-22T17:17:55 1769102275

Well DeepSeek V4 is rumored to be in that range and will be released in 3 weeks.

aussieguy1234 · 2026-01-23T01:52:59 1769133179

I could say the same about grok (although given there are better models for my use cases I don't use it). What part of divisive politics are you talking about here?

sampton · 2026-01-22T16:18:30 1769098710

Every time Dario opens his mouth it's something weird.