More

sidchilling · 2026-01-26T16:10:53 1769443853

What would a good coding model to run on an M3 Pro (18GB) to get Codex like workflow and quality? Essentially, I am running out quick when using Codex-High on VSCode on the $20 ChatGPT plan and looking for cheaper / free alternatives (even if a little slower, but same quality). Any pointers?

duffyjp · 2026-01-26T16:18:58 1769444338

Nothing. This summer I set up a dual 16GB GPU / 64GB RAM system and nothing I could run was even remotely close. Big models that didn't fit on 32gb VRAM had marginally better results but were at least of magnitude slower than what you'd pay for and still much worse in quality.

I gave one of the GPUs to my kid to play games on.

Tostino · 2026-01-26T17:19:39 1769447979

Yup, even with 2x 24gb GPUs, it's impossible to get anywhere close to the big models in terms of quality and speed, for a fraction of the cost.

mirekrusin · 2026-01-26T21:05:34 1769461534

I'm running unsloth/GLM-4.7-Flash-GGUF:UD-Q8_K_XL via llama.cpp on 2x 24G 4090s which fits perfectly with 198k context at 120 tokens/s – the model itself is really good.

fsiefken · 2026-01-26T22:51:05 1769467865

I can confirm, running glm-4.7-flash-7e-qx54g-hi-mlx here, a 22gb model @q5 on m4 max pro and 59 tokens/s.

medvezhenok · 2026-01-26T16:12:57 1769443977

Short answer: there is none. You can't get frontier-level performance from any open source model, much less one that would work on an M3 Pro.

If you had more like 200GB ram you might be able to run something like MiniMax M2.1 to get last-gen performance at something resembling usable speed - but it's still a far cry from codex on high.

mittermayr · 2026-01-26T16:13:07 1769443987

at the moment, I think the best you can do is qwen3-coder:30b -- it works, and it's nice to get some fully-local llm coding up and running, but you'll quickly realize that you've long tasted the sweet forbidden nectar that is hosted llms. unfortunately.

evilduck · 2026-01-26T17:58:12 1769450292

They are spending hundreds of billions of dollars on data centers filled with GPUs that cost more than an average car and then months on training models to serve your current $20/mo plan. Do you legitimately think there's a cheaper or free alternative that is of the same quality?

I guess you could technically run the huge leading open weight models using large disks as RAM and have close to the "same quality" but with "heat death of the universe" speeds.

tosh · 2026-01-26T20:48:13 1769460493

18gb RAM it is a bit tight

with 32gb RAM:

qwen3-coder and glm 4.7 flash are both impressive 30b parameter models

not on the level of gpt 5.2 codex but small enough to run locally (w/ 32gb RAM 4bit quantized) and quite capable

but it is just a matter of time I think until we get quite capable coding models that will be able to run with less RAM

adam_patarino · 2026-01-27T15:55:31 1769529331

ahem ... cortex.build

Current test version runs in 8GB @ 60tks. Lmk if you want to join our early tester group!

jgoodhcg · 2026-01-26T16:20:20 1769444420

Z.ai has glm-4.7. Its almost as good for about $8/mo.

margorczynski · 2026-01-26T17:11:09 1769447469

Not sure if it's me but at least for my use cases (software devl, small-medium projects) Claude Opus + Claude Code beats by quite a margin OpenCode + GLM 4.7. At least for me Claude "gets it" eventually while GLM will get stuck in a loop not understanding what the problem is or what I expect.

zamalek · 2026-01-26T17:50:31 1769449831

Right, GLM is close But not close enough. If I have to spend $200 for Opus fallback i may as well not use it always. Still an unbelievable option if $200 is a luxury, the price-per-quality is absurd.

Mashimo · 2026-01-26T16:13:51 1769444031

A local model with 18GB of ram that has the same quality has codex high? Yeah, nah mate.

The best could be GLN 4.7 Flash, and I doubt it's close to what you want.

atwrk · 2026-01-26T16:17:20 1769444240

"run" as in run locally? There's not much you can do with that little RAM.

If remote models are ok you could have a look at MiniMax M2.1 (minimax.io) or GLM from z.ai or Qwen3 Coder. You should be able to use all of these with your local openai app.

marcd35 · 2026-01-26T18:11:14 1769451074

antigravity is solid and has a generous free tier.

sidchilling · 2025-08-27T18:36:49 1756319809

I downloaded the app but it keeps crashing on open

jacobx · 2025-08-27T18:39:31 1756319971

Sorry to hear that! What version of iOS are you running?

sidchilling · 2025-08-28T05:40:57 1756359657

ios 18.1.1

sidchilling · 2025-07-09T09:00:18 1752051618

I agree. The articles misses some really good ones, - prepone, max-to-max (ज़्यादा से ज़्यादा), scanner (for payment QR-codes)

sidchilling · 2025-06-24T08:20:56 1750753256

Is this not available in India? Would love to try it.

shuhongwu · 2025-06-24T13:53:01 1750773181

Yes, we’ve just made our service available in India.

sidchilling · 2025-06-14T11:56:12 1749902172

Does it work? I get thrown to homepage without anything after uploading screenshots and clicking the CTA. Bug? Or intentional?

jayantrao94 · 2025-06-14T13:28:01 1749907681

Try now

sidchilling · 2025-05-26T13:40:18 1748266818

Some things I’m curious to hear from HN:

(1) Are you optimizing your content for LLMs yet? (2) What do you think AI traffic will look like 6–12 months from now? (3) What kinds of AI visibility issues have you run into?

Happy to answer anything about how we built the tool or what’s under the hood too.

sidchilling · 2025-03-28T11:00:44 1743159644

Experts will be in denial of LLMs for a long time, while the non-experts will swiftly use it to bridge their own knowledge gap. This is the use case for LLMs, maybe more so than 100% correctness.

sidchilling · 2025-02-12T18:38:03 1739385483

I don’t like watching videos (or rather can’t watch at night, but can read). To date I haven’t found an AI that can produce a good article from a video. No, not just transcribe the video but actually produce a quality article with images and stuff from the video. Like a human who is instructed as “watch this video and produce a very high quality article that talks about the things talked about in the video”.

Has anyone has any luck with this?

It would be awesome if I can give this link to AI that will produce a PDF, each video being a chapter in the PDF.

ignoramous · 2025-02-12T18:54:48 1739386488

> would be awesome if I can give this link to AI that will produce a PDF, each video being a chapter in the PDF

If you've got is slides+YouTube, then https://notebooklm.google [0] might work wonderfully well. It does for me. Though, it more Q&A than an article with illustration.

[0] https://illuminate.google.com is its limited/audio-only alternative.

sidchilling · on Oct 25, 2024

Is there any code assistant that can parse entire big codebases and help me implement a “story”?

gitnik · on Oct 26, 2024

Cursor can index your entire codebase and help you implement changes across multiple files. No idea what sort of limits this feature has, but for me it's worked well enough for things like rewriting list pages with a new table component.

sidchilling · on April 28, 2021

YES! If you’re actually in India, you will know how useful Twitter has been with this. People are amplifying tweets about other people’s needs and other folks are sharing verified leads. All of this without any incentive to do so. Where the government is daily failing, citizens are rising to the occasion.

Bang2Bay · on April 28, 2021

Well, government did not fail in this case. at least anecdotal evidence of where the government failed would be good.

A union cabinet minister of the country tried to personally reach out and failed. sent cops to help who now suspect foul play.