More

chazeon · 2025-12-02T03:12:44 1764645164

Well the seemingly cheap comes with significantly degraded performance, particular for agentic use. Have you tried replacing Claude Code with some locally deployed model, say, on 4090 or 5090? I have. It is not usable.

nylonstrung · 2025-12-02T06:19:17 1764656357

Deepseek and Kimi both have great agentic performance

When used with crush/opencode they are close to Claude performance.

Nothing that runs on a 4090 would compete but Deepseek on openrouter is still 25x cheaper than claude

Aeolun · 2025-12-02T11:44:46 1764675886

> Deepseek on openrouter is still 25x cheaper than claude

Is it? Or only when you don’t factor in Claude cached context? I’ve consistently found it pointless to use open models because the price of the good ones is so close to cached context on Claude that I don’t need them.

joefourier · 2025-12-02T12:06:52 1764677212

Deepseek via their API also has cached context, although the tokens/s was much lower than Claude when I tried it. But for background agents the price difference makes it absolutely worth it.

ewoodrich · 2025-12-02T16:43:29 1764693809

Yes, if you try using Kilo Code/Cline via Openrouter the cost will be much cheaper using Deepseek/Kimi vs Claude Sonnet 4.5.

estsauver · 2025-12-02T03:55:50 1764647750

Well, those are also extremely limited vram areas that wouldn't be able to run anything in the ~70b parameter space. (Can you run 30b even?)

Things get a lot more easier at lower quantisation, higher parameter space, and there's a lot of people's whose jobs for AI are "Extract sentiment from text" or "bin into one of these 5 categories" where that's probably fine.

elif · 2025-12-02T14:34:25 1764686065

Strictly speaking, you have not deployed any model on a 5090 because a 5090 card has never been produced.

And without specifying your quantization level it's hard to know what you mean by "not usable"

Anyway if you really wanted to try cheap distilled/quantized models locally you would be using used v100 Teslas and not 4 year old single chip gaming GPUs.

__alexs · 2025-12-02T17:16:35 1764695795

Are you a time traveller from the past? https://www.nvidia.com/en-gb/geforce/graphics-cards/50-serie...

matthewmacleod · 2025-12-02T15:42:41 1764690161

You can just buy a 5090 now for $3k. Have you confused it with something else?

JosephjackJR · 2025-12-02T18:09:47 1764698987

they took the already ridiculous v3.1 terminus model, added this new deepseek sparse attention thing, and suddenly it’s doing 128k context at basically half the inference cost of the old version with no measurable drop in reasoning or multilingual quality. like, imo gold medal level math and code, 100+ languages, all while sipping tokens at 14 cents per million input. that’s stupid cheap. the rl recipe they used this time also seems way more stable. no more endless repetition loops or random language switching you sometimes got with the earlier open models. it just works. what really got me is how fast the community moved. vllm support landed the same day, huggingface space was up in hours, and people are already fine-tuning it for agent stuff and long document reasoning. i’ve been playing with it locally and the speed jump on long prompts is night and day. feels like the gap to the closed frontier models just shrank again. anyone else tried it yet?

chazeon · 2025-11-18T21:15:35 1763500535

Seem images on GitHub web also not showing

chazeon · 2025-11-12T22:49:13 1762987753

The one thing I wish it has is 3.5mm audio jack. Both Xbox and SONY's dualsense controller have this. But SONY don't support audio via Bluetooth. The Xbox one need a USB adapter but its build is not as good as SONY's. SONY don't have a USB adapter. Given Steam controller is already using an USB puck, it should be able to support it.

chazeon · 2025-10-16T17:46:15 1760636775

Gemini is the only model that can provide consistent solution to theoretical physics problems and output it into LaTeX document.

chazeon · 2025-10-09T18:15:57 1760033757

you can just buy a nanokvm

chazeon · 2025-10-09T18:13:18 1760033598

Aliexpress has them

chazeon · 2025-09-16T19:49:42 1758052182

Infuse is good, but it does not feel so well-polished for the desktop, for example, some windows for pop-up could have been a real window, but were a pop-up that blocks the main player.

dmix · 2025-09-16T22:25:21 1758061521

Agreed. I still use it anyway because I like icloud syncing

chazeon · 2025-08-24T03:34:35 1756006475

I want to note that: long prompts are good only if the model is optimized for it. I have tried to swap the underlying model for Claude Code. Most local models, even those claimed to work with long context and tool use, don't work well when instruction becomes too long. This has become an issue for tool use, where tool use works well in small ChatBot-type conversation demos, but when Claude's code-level prompt length increases, it just fails, either forgetting what tools are there, forgetting to use them, or returning in the wrong formats. Only the model by OpenAI, Google's Gemini, kind of works, but not as well as Anthropic's own models. Besides they feel much slower.

chazeon · 2025-08-13T13:48:41 1755092921

I do japanese transcription + gemini translations. It’s worse than fansub, but its much much better than nothing. First thing that could struggle is actually the vad, then is special names and places, prompting can help but not always. Finally it’s uniformity (or style). I still feel that I can’t control the punctuation well.

chazeon · 2025-07-25T16:18:08 1753460288

Recently, I visited the Pennsylvania Railroad Museum and was fascinated to learn that when steel railcars were first introduced—despite being far safer than their wooden predecessors, which could easily be crushed—many people feared they might attract lightning. It's such a good analogue to our movement into AI reality.