Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The current pipeline expects PCM audio blobs and, if data is coming from a microphone in the browser, you can do the initial processing and conversion in the browser (see the JS in this single file Phoenix app speech to text example [0]).

On the other hand, if you expect a variety of formats (mp3, wav, etc), then shelling out or embedding ffmpeg is probably the quickest path to achieve something. The Membrane Framework[1] is an option here too which includes streaming. I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

[0]: https://github.com/elixir-nx/bumblebee/blob/main/examples/ph...

[1]: https://membrane.stream/



> I believe Lars is going to do a cool demo with Membrane and ML at ElixirConf EU next week.

Yes, the relevant part of his demo with the membrane pipeline appears to be here: https://github.com/lawik/lively/blob/master/lib/lively/media...


limited in usefulness.. seems that Lars kept a MembraneTranscript library dependency private



Quick example video from Chris McCord using ffmpeg and whisper in Phoenix: https://www.phoenixframework.org/blog/whisper-speech-to-text...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: