Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have several local models I hit up first (Mixtral, Llama), if I don’t like the results then I’ll give same prompt to Claude and GPT.

Overall though it’s really just for reference and/or telling me about some standard library function I didn’t know of.

Somewhat counterintuitively I spend way more time reading language documentation than I used to, as the LLM is mainly useful in pointing me to language features.

After a few very bad experiences I never let LLM write more than a couple lines of boilerplate for me, but as a well-read assistant they are useful.

But none of them are sufficient alone, you do need a “team” of them - which is why I also don’t see the value is spending this much on one model. I’d spend that much on a system that polled 5 models concurrently and came up with a summary of sorts.



People keep talking about using LLMs for writing code, and they might be useful for that, but I've found them much more useful for explaining human-written code than anything else, especially in languages/frameworks outside my core competency.

E.g. "why does this (random code in a framework I haven't used much) code cause this error?"

About 50% of the time I get a helpful response straight away that saves me trawling through Stack Overflow and random blog posts. About 25% of the time the response is at least partially wrong, but it still helps me get on the right track.

25% of the time the LLM has no idea and won't admit it so I end up wasting a small amount of time going round in circles, but overall it's a significant productivity boost when I'm working on unfamiliar code.


Right on, I like to use local models - even though I also use OpenAI, Anthropic, and Google Gemini.

I often use one or two shot examples in prompts, but with small local models it is also fairly simple to do fine tuning - if you have fine tuning examples, and if you are a developer so you get the training data in the correct format, and the correct format changes for different models that you are fine tuning.


> But none of them are sufficient alone, you do need a “team” of them

Given the sensitivity to parameters and prompts the models have, your "team" can just as easily be querying the same LLM multiple times with different system prompts.


Other factor is I use local LLM first because I don’t trust any of the companies to protect my data or software IP.


What model sizes do you run locally? Anything that would work on a 16GB M1?


I ha e a 32G M2, but most local models I use fit into my 8G old M1 laptop.

I can run the QwQ 32G model with Q4 on my 32G M2.

I suggest using https://Ollama.com on Mac, Windows, and Linux. I experiments with all options on Apple Silicon and liked Ollama best.


I have an A6000 with 48GB VRAM I run from a local server and I connect to it using Enchanted on my Mac.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: