It could be built to use local models completely. Open source transcription mode...

It could be built to use local models completely.

Open source transcription models are already good enough to do this, and with good context engineering, the base models might be good enough, too.

It wouldn't be trivial to implement, but I think it's possible already.