Meta Open-Sources Megalodon LLM for Efficient Long Sequence Modeling

arjvik · on June 11, 2024

> For what it's worth, RWKV's website on that matter mentions that yes it's bad on recall, but for the vast majority of tasks you can just ask the question before the content, and it'll handle the task just fine.

amluto · on June 11, 2024

If you ask me a question before giving me the material being queried, I do better, too! It’s the difference between an open-book and closed-book test.

The Transformer read-then-query model is, IMO, a bit odd. It preprocesses the input and then people want it to be able to answer any question about the input and would also like the query to run in time independent of the input length, and people are sad that the best models take time linear in the input length. No kidding: that’s the algorithmic complexity of even a straightforward, non-ML approach!

ai_what · on June 11, 2024

Didn't this happen in April?

Edit: Yes it did, https://github.com/XuezheMax/megalodon

skilled · on June 11, 2024

causalmodels · on June 11, 2024

repo link: https://github.com/XuezheMax/megalodon

koolala · on June 11, 2024

yay dinosaurs!

meindnoch · on June 11, 2024

Megalodon was not a dinosaur. It was a fish (shark).

lochiero · on June 11, 2024

So old... Another epoch talking about Llama2

1024core · on June 11, 2024

MEGALODON builds on the research team's previous model, MEGA (exponential moving average with gated attention)

I like how the team gingerly avoided calling the previous model "MAGA" ... :-D

codetrotter · on June 11, 2024

I could give two sh*ts about Mr D.T. but I’m more curious to see the reaction of one Kim Dotcom, aka the Mega dude himself.