https://docs.cloud.google.com/tpu/docs/system-architecture-t...
https://openxla.org/xla/sparsecore
Here's another inference-efficient architecture where TPUs are useless: https://arxiv.org/pdf/2210.08277
There is no matrix-vector multiplication. Parameters are estimated using Gumbel-Softmax. TPUs are of no use here.
Inference is done bit-wise and most efficient inference is done after application of boolean logic simplification algorithms (ABC or mockturtle).
In my (not so) humble opinion, TPUs are example case of premature optimization.
https://docs.cloud.google.com/tpu/docs/system-architecture-t...
https://openxla.org/xla/sparsecore