This still isn't technically dynamic allocation since it always takes a top-k (c...

		f_devd on March 21, 2023 \| parent \| context \| favorite \| on: CoLT5: Faster Long-Range Transformers With Conditi... This still isn't technically dynamic allocation since it always takes a top-k (constant k) tokens from the sequence, so more like dynamic routing, which was explored in Mixture-of-Expert models but only in Feed-Forward blocks and with a different routing scheme.