I'm surprised the SPUs were used for post-processing, cause whenever I try to do software rendering I get bottlenecked on fill rate quickly. I believe you, because I've seen it attested in many places, but I'm surprised by it.
The 1:1 straight-line behavior of fullscreen post processing is much easier to prefetch than triangle rasterization. And, in this case the SPUs and GPU used the same memory. So, no bandwidth advantage to the GPU. The best the GPU could do would be hiding latency better.