Mamba based LLMs aren't even close to novel though. IBM's been doing this since ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jychang 19 days ago \| parent \| context \| favorite \| on: Zebra-Llama – Towards efficient hybrid models Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1]. Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer. I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them. [1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model

credit_guy 18 days ago [–]

It's not cutting edge, so what? Your point is that nobody should publish anything unless it is cutting edge?

jychang 18 days ago | [–]

Yeah, that's the point of publishing. You get scooped, you lose.

credit_guy 17 days ago | | [–]

This wasn’t published, it was just posted to the arxiv.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact