Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm glad someone else had the same thought. I have been wondering what their "secret sauce" is for a while given how their model doesn't degrade for long-context nearly as much as other LLMs that are otherwise competitive. It could also just be that they used longer-context training data than anyone else though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: