We run coding assistance models on MacBook Pros locally, so here is my experience:
On hardware side I recommend Apple M1 / M2 / M3 with at least 400Gb/s memory bandwidth. For local coding assistance this is perfect for 7B or 33B models.
We also run a Mac Studio with a bigger model (70b), M2 ultra and 192GB ram, as a chat server. It's pretty fast. Here we use Open WebUI as interface.
Software wise Ollama is OK as most IDE plugins can work with it now. I personally don't like the go code they have. Also some key features are missing from it that I would need and those are just never getting done, even as multiple people submitted PRs for some.
LM Studio is better overall, both as server or as chat interface.
I can also recommend CodeGPT plugin for JetBrains products and Continue plugin for VSCode.
As a chat server UI as I mentioned Open WebUI works great, I use it with together ai too as backend.
An M2 ultra with 192 gb isn't cheap, did you have it lying around for whatever reason or do you have some very solid business case for running the model locally/on prem like that?
Or maybe I'm just working in cash poor environments...
Edit: also, can you do training / finetuning on an m2 like that?
We had some as build agent around already.
We don't plan to do any fine tuning or training, so we did not explore this at all. However I don't think it is a viable option.
I think it is even easier right now for companies to self host an inference server with basic rag support:
- get a Mac Mini or Mac Studio
- just run ollama serve,
- run ollama web-ui in docker
- add some coding assitant model from ollamahub with the web-ui
- upload your documents in the web-ui
No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context.
For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation.
We actually run already in-house ollama server prototype for coding assistance with deepseek coder and it is pretty good. Now if we would get a model for this, that is on chatgpt 4 level, I would be super happy.
For projects where the estimated rewrite duration exceeds three months, we have employed an iterative approach to refactoring for several years. This methodology has yielded pretty good results.
We also utilize a series of Bash scripts designed to monitor the refactoring process. These scripts collect data regarding the utilization of both the old and new "state" within the codebase. The collected data is then dumped in Grafana, providing us with a clear overview of our progress.
An example: “Are We ESMified Yet?” is a Mozilla dashboard tracking an incremental Firefox code migration (1.5 years and counting) from a Mozilla-specific "JSM" JavaScript module system to the standard ECMAScript Module "ESM" system. Current ESMification: 96.69%.
“Are We X Yet?” is a Mozilla meme for dashboards like this.
I saw the phrase “are we X yet” used in the Rust community (is Rust ready for games or whatever) but never realised the phrase’s origin with Mozilla. Thank you for the little piece of history.
“Are We Meta Yet?” http://www.arewemetayet.com/ is an incomplete and outdated list of some of these dashboards. Some domains expired and are now squatted.
If you have scripts that can count uses of the deprecated code, you can use them to detect regressions and generate build warnings if someone adds new code using the deprecated code. Periodically you can decrease the script’s max use counter, ratcheting down until you hit zero uses.
Oh, the idea of tracking the state of the refactoring process with small scripts is very cool. Obvious in retrospective too. These scripts would be useful even if they're only like 90% correct.
I find it awesome. Maybe it is targeted to my age group. Sadly I have an iPhone 13 and won't upgrade in the next 2-3 years. Otherwise I would order it right now.
reply