Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The user you originally replied to specifically mentioned > without going to text first


Yeah, and that's my understanding. Nothing goes video -> text, or audio -> text, or even text -> text without first going through state space. That's where the core of the transformer architecture is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: