Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In ordinary gradient descent the order does matter, since the position changes in between. I think stochastic gradient descent does sum a couple of gradients together sometimes, but I'm not sure what the trade-offs are and if LLMs do so as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: