Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1].

Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer.

I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them.

[1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model



It's not cutting edge, so what? Your point is that nobody should publish anything unless it is cutting edge?


Yeah, that's the point of publishing. You get scooped, you lose.


This wasn’t published, it was just posted to the arxiv.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: