> The conclusion was that fetch, decode, and interconnect were the most importan...

gumby · on May 19, 2024

> on a future dominated by gigantic core counts of really simple cores that may even be in-order. The extra transistors to speed up the processing are probably better spent on another core that can just handle the next request instead.

I like how you think, but not sure we can get there.

Part of the RISC dream is that you shouldn’t need reordering as the ops should be similar in timing, so you could partially get there on that (although…look at divide, which IIRC the 801 didn’t even have). So they aren’t really uniform in length even before you consider memory issues.

Then there is memory — not all references can be handled before instruction decode because a computed indirect reference requires computation from the ALU and register state to perform. You can’t just send the following instruction to another unit as its computation may depend on the instruction you’re waiting for.

That’s why the instruction decode-and-perform has been smashed into lots of duplicated bits: to do what you say but on the individual steps.

At a different level of granularity, big.LITTLE is an approach that’s along your line of thinking. We tend to have a lot more computation than we need, even at the embedded level, these days and we may end up with some tiny power-sipping coprocessors that dominate most of the runtime of a device while doing almost nothing.

zimpenfish · on May 19, 2024

> a future dominated by gigantic core counts of really simple cores that may even be in-order

Chuck Moore (of Forth fame) has been advocating something like this for a while, although he's more "multi-computer" than "multi-core".

https://www.greenarraychips.com/index.html

fanf2 · on May 19, 2024

> I'm betting on a future dominated by gigantic core counts of really simple cores that may even be in-order.

¿Por qué no los dos? ¡CPU and GPU!