> For what it's worth, RWKV's website on that matter mentions that yes it's bad on recall, but for the vast majority of tasks you can just ask the question before the content, and it'll handle the task just fine.
If you ask me a question before giving me the material being queried, I do better, too! It’s the difference between an open-book and closed-book test.
The Transformer read-then-query model is, IMO, a bit odd. It preprocesses the input and then people want it to be able to answer any question about the input and would also like the query to run in time independent of the input length, and people are sad that the best models take time linear in the input length. No kidding: that’s the algorithmic complexity of even a straightforward, non-ML approach!