Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Meta Open-Sources Megalodon LLM for Efficient Long Sequence Modeling (infoq.com)
129 points by rbanffy on June 11, 2024 | hide | past | favorite | 10 comments


> For what it's worth, RWKV's website on that matter mentions that yes it's bad on recall, but for the vast majority of tasks you can just ask the question before the content, and it'll handle the task just fine.


If you ask me a question before giving me the material being queried, I do better, too! It’s the difference between an open-book and closed-book test.

The Transformer read-then-query model is, IMO, a bit odd. It preprocesses the input and then people want it to be able to answer any question about the input and would also like the query to run in time independent of the input length, and people are sad that the best models take time linear in the input length. No kidding: that’s the algorithmic complexity of even a straightforward, non-ML approach!


Didn't this happen in April?

Edit: Yes it did, https://github.com/XuezheMax/megalodon


Related:

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length - https://news.ycombinator.com/item?id=40054901 - April 2024 (28 comments)



yay dinosaurs!


Megalodon was not a dinosaur. It was a fish (shark).


So old... Another epoch talking about Llama2


MEGALODON builds on the research team's previous model, MEGA (exponential moving average with gated attention)

I like how the team gingerly avoided calling the previous model "MAGA" ... :-D


I could give two sh*ts about Mr D.T. but I’m more curious to see the reaction of one Kim Dotcom, aka the Mega dude himself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: