More

keriati1 · 2026-01-22T17:43:07 1769103787

Today github labels, tomorrow paperclips?

keriati1 · 2026-01-06T16:00:32 1767715232

I’m not sure why, but the first thing I did was check if HTTP status code 418 was listed.

BLGardner · 2026-01-06T18:42:20 1767724940

Guess I missed that one! (Maybe more?)

keriati1 · 2025-09-25T11:23:37 1758799417

I can also recommend rather to use the stacked PR approach. We have it since years, PR review "issues" are not a thing for us.

I still encourage do to a lot of small commits with good commit messages, but don't submit more then 2-3 or 4 commits in a single PR...

keriati1 · on Oct 16, 2024

What model size is used here? How much memory does the GPU have?

thawab · on Oct 16, 2024

he is using the 3b one, since it's the default when downloading it from ollama: https://ollama.com/library/llama3.2

keriati1 · on April 1, 2024

We run coding assistance models on MacBook Pros locally, so here is my experience: On hardware side I recommend Apple M1 / M2 / M3 with at least 400Gb/s memory bandwidth. For local coding assistance this is perfect for 7B or 33B models.

We also run a Mac Studio with a bigger model (70b), M2 ultra and 192GB ram, as a chat server. It's pretty fast. Here we use Open WebUI as interface.

Software wise Ollama is OK as most IDE plugins can work with it now. I personally don't like the go code they have. Also some key features are missing from it that I would need and those are just never getting done, even as multiple people submitted PRs for some.

LM Studio is better overall, both as server or as chat interface.

I can also recommend CodeGPT plugin for JetBrains products and Continue plugin for VSCode.

As a chat server UI as I mentioned Open WebUI works great, I use it with together ai too as backend.

isoprophlex · on April 1, 2024

An M2 ultra with 192 gb isn't cheap, did you have it lying around for whatever reason or do you have some very solid business case for running the model locally/on prem like that?

Or maybe I'm just working in cash poor environments...

Edit: also, can you do training / finetuning on an m2 like that?

keriati1 · on April 1, 2024

We had some as build agent around already. We don't plan to do any fine tuning or training, so we did not explore this at all. However I don't think it is a viable option.

shostack · on April 1, 2024

Can the Continue plugin handle multiple files in a directory of code?

keriati1 · on Feb 9, 2024

I think it is even easier right now for companies to self host an inference server with basic rag support:

- get a Mac Mini or Mac Studio - just run ollama serve, - run ollama web-ui in docker - add some coding assitant model from ollamahub with the web-ui - upload your documents in the web-ui

No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context. For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation.

keriati1 · on Jan 29, 2024

We actually run already in-house ollama server prototype for coding assistance with deepseek coder and it is pretty good. Now if we would get a model for this, that is on chatgpt 4 level, I would be super happy.

eurekin · on Jan 29, 2024

Did you finetune a model?

keriati1 · on Jan 29, 2024

No, we went with RAG pipeline approach as we assume things change too fast.

eurekin · on Jan 29, 2024

Thanks! Any details how you chunk and find the relevant code?

Or how you deal with context length? I.e. do you send anything other than the current file? How is the prompt constructed?

keriati1 · on Jan 27, 2024

For projects where the estimated rewrite duration exceeds three months, we have employed an iterative approach to refactoring for several years. This methodology has yielded pretty good results.

We also utilize a series of Bash scripts designed to monitor the refactoring process. These scripts collect data regarding the utilization of both the old and new "state" within the codebase. The collected data is then dumped in Grafana, providing us with a clear overview of our progress.

cpeterso · on Jan 27, 2024

An example: “Are We ESMified Yet?” is a Mozilla dashboard tracking an incremental Firefox code migration (1.5 years and counting) from a Mozilla-specific "JSM" JavaScript module system to the standard ECMAScript Module "ESM" system. Current ESMification: 96.69%.

“Are We X Yet?” is a Mozilla meme for dashboards like this.

https://spidermonkey.dev/areweesmifiedyet/

zogrodea · on Jan 28, 2024

I saw the phrase “are we X yet” used in the Rust community (is Rust ready for games or whatever) but never realised the phrase’s origin with Mozilla. Thank you for the little piece of history.

cpeterso · on Jan 28, 2024

AFAIK, https://arewefastyet.com/ (AWFY) was the first, registered in 2010.

“Are We Meta Yet?” http://www.arewemetayet.com/ is an incomplete and outdated list of some of these dashboards. Some domains expired and are now squatted.

cpeterso · on Jan 28, 2024

If you have scripts that can count uses of the deprecated code, you can use them to detect regressions and generate build warnings if someone adds new code using the deprecated code. Periodically you can decrease the script’s max use counter, ratcheting down until you hit zero uses.

Kinrany · on Jan 27, 2024

Oh, the idea of tracking the state of the refactoring process with small scripts is very cool. Obvious in retrospective too. These scripts would be useful even if they're only like 90% correct.

keriati1 · on Jan 4, 2024

I find it awesome. Maybe it is targeted to my age group. Sadly I have an iPhone 13 and won't upgrade in the next 2-3 years. Otherwise I would order it right now.

keriati1 · on Jan 2, 2024

+1 for the bucket queue. I learned about that trick a few weeks ago and in my use cases it cut the time to run A* by around 60-70%.