No, GPUs are not a particularly ideal architecture for AI inference, it's just i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jasonwatkinspdx 45 days ago \| parent \| context \| favorite \| on: SoftBank sells its entire stake in Nvidia No, GPUs are not a particularly ideal architecture for AI inference, it's just inference needs way more memory bandwidth than a general purpose CPU's memory hierarchy can handle. > What does an ASIC for matmul look like? A systolic array, and ultimately quite different than a GPU. This is why TPUs et all are a thing. In general with a systolic array you get a quadratic speedup. For example with a 256x256 array, it takes 256 cycles to shift operations in and out, but in doing that you accomplish 65k MACs in 512 cycles for a speedup of 128x over serial.

adastra22 45 days ago [–]

The tensor cores in your NVIDIA chip are systolic arrays.

jasonwatkinspdx 45 days ago | [–]

Not in the same way as a TPU, and again, the ISA and overall threading architecture matter.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact