TPUs do include dedicated hardware, SparseCores, for sparse operations. https://...

thesz · 2025-11-27T22:10:12 1764281412

SparseCores appear to be block-sparse as opposed to element-sparse. They use 8- and 16-wide vectors to compute.

Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277

There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.

Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).

In my (not so) humble opinion, TPUs are example case of premature optimization.

HarHarVeryFunny · 2025-11-27T22:46:12 1764283572

They are on their 7th generation now, so presumably the architecture is being updated as needs require.