It goes way beyond that. Transformers are the first practical, scalable, general...

sterlind · on Oct 16, 2023

everything? they've solved reinforcement learning? they can handle continuous domains, like robot motion? that's funny, I thought they could only handle sequences of tokens.*

yes, they're exciting, and they are the most general architecture we've found so far, but there are important problems in AI (like anything continuous), that they're really not suited for.

I think there's better architectures out there for many tasks, and I'm a little dismayed that everyone seems to be cargo-culting the GPT architecture rather than taking the lessons for transformers and experimenting with more specialized algorithms.

*btw they don't need quantized tokens, there's no reason they can't just work on continuous vectors directly, and they don't have to be causal or limited to one sequence, but "transformer" seems to mean GPT in everyone's mind, and even though the original transformer was an encoder-decoder model we rarely seem to see those these days for some reason.

famouswaffles · on Oct 16, 2023

>they've solved reinforcement learning?

Transformers can do Reinforcement Learning yes.

https://arxiv.org/abs/2106.01345

https://arxiv.org/abs/2205.14953

>they can handle continuous domains, like robot motion?

Yes they can handle it just fine. Excellently in fact.

https://www.deepmind.com/blog/scaling-up-learning-across-man...

https://tidybot.cs.princeton.edu/

https://general-pattern-machines.github.io/

https://wayve.ai/thinking/lingo-natural-language-autonomous-...

I don't know if anyone is saying they're the best at or have "solved" everything but they can damn near do anything.

esafak · on Oct 16, 2023

I'm not sure about the last assertion. How would you even compare them? It is easier to deploy newer algorithms (software) than hardware.

sigmoid10 · on Oct 16, 2023

There's two ways you can solve currently intractable problems: Find better algorithms or improve the hardware. It's actually insanely hard to come up with new algorithms, that is why machine learning and AI were lagging behind most of computer science for decades.