I really hope this works out. Death to tokenizers!
Interesting that it's a hierarchical structure but only two levels of hierarchy. Stacking more levels seems like an obvious direction for further research.
Note: I posted this comment on another related story[1] and the author replied:
"Author here :), I do think it’s a good direction to look into! That said, aside from it being a bit too much to do at once, you’d also have to be careful about how you distributed your FLOP budget across the hierarchy. With two levels, you can make one level (bytes/local encoder) FLOP efficient and the other (patches/global encoder) FLOP intensive. You’d also need to find a way to group patches into larger units. But ya, there are many directions to go from here!"
Interesting that it's a hierarchical structure but only two levels of hierarchy. Stacking more levels seems like an obvious direction for further research.
Note: I posted this comment on another related story[1] and the author replied:
"Author here :), I do think it’s a good direction to look into! That said, aside from it being a bit too much to do at once, you’d also have to be careful about how you distributed your FLOP budget across the hierarchy. With two levels, you can make one level (bytes/local encoder) FLOP efficient and the other (patches/global encoder) FLOP intensive. You’d also need to find a way to group patches into larger units. But ya, there are many directions to go from here!"
[1] https://news.ycombinator.com/item?id=42413430