Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The bottleneck in those is not the arithmetic operation but the memory bandwidth once you have to spill your matrix out of SRAM.

As it stands right now, it is actually better to have a slower algorithm that uses the local memory more efficiently.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: