Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This still isn't technically dynamic allocation since it always takes a top-k (constant k) tokens from the sequence, so more like dynamic routing, which was explored in Mixture-of-Expert models but only in Feed-Forward blocks and with a different routing scheme.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: