Hacker Newsnew | past | comments | ask | show | jobs | submit | kraddypatties's commentslogin

Running into "no healthy upstream" when navigating to the link -- hug of death maybe?

Indeed, we had a huge influx, should be back up now. Thanks for pointing it out

been lurking for most of my adult life (and it shows :-))

Thanks HN! You make me smarter every (other) day.


Thanks for trying it out!

Yea that latency makes sense; "listening" includes turn detection and STT, "thinking" LLM + TTS _and then_ our model, so the pipeline latency stacks up pretty quick. The actual video model starts streaming out frames <500ms from the TTS generation, but we're still working on reducing latency from parts of the pipeline that we are using off the shelf.

We have a high level blog post here https://www.keyframelabs.com/blog/persona-1 about the architecture of the video model, the WebRTC "agent" stack is Livekit + a few backend components hosted in Modal.


We've been tinkering with building realtime talking head models (avatar models, etc.) for a while now, and finally have something that works (well enough)! Operates at ~2x realtime on a 4090, significantly faster than that on enterprise grade GPUs.

You can try it yourself at https://playground.keyframelabs.com/playground/persona-1 and there's a (semi)technical blog post at https://www.keyframelabs.com/blog/persona-1

The main use case we designed for was language learning, particularly having a conversational partner -- generally we've found that adding a face to the voice really helps trigger the fight or flight response, which we've found to be the hardest part of speaking a new language with confidence.

But in building out the system around the model to enable that use case (tool use on a canvas for speaking prompts and images, memory to make conversations less stale, etc.), we think there's potential for other use cases too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: