Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So basically multiple CLS tokens.

Fwiw, I tried multiple global tokens in my chess neural net and didn't see any uplift compared to my baseline of just having one.



Note that it's not done for performance reason but rather to generate clear feature maps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: