Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The conclusion was that fetch, decode, and interconnect were the most important power consumers and CISC vs RISC didn’t quite matter as much

Wouldn't that mean that x86 and other CISCs would cost more as they often have variable length instructions and thus the most complexity and consume the most power?

Now, if you told me that scheduling uops and reordering uops etc consumed the most power, then I'd understand why RISC vs CISC wouldn't matter as at uop level are high performance chips are similar VLIW computers (AFAICT, IANACD).

Though, for my money, I'm betting on a future dominated by gigantic core counts of really simple cores that may even be in-order. The extra transistors to speed up the processing are probably better spent on another core that can just handle the next request instead. Clock speeds brought down to some integer multiple of RAM speeds, and power costs brought down. At cloud scale this just seems economically inevitable. After that, the economies of scale mean it'll trickle down to everyone else except for specialised use cases.

I believe we already seeing this happening with Zen C cores and Intel e cores, and that a simplified instruction set lie RISC-V will eventually win out for savings alone.



> on a future dominated by gigantic core counts of really simple cores that may even be in-order. The extra transistors to speed up the processing are probably better spent on another core that can just handle the next request instead.

I like how you think, but not sure we can get there.

Part of the RISC dream is that you shouldn’t need reordering as the ops should be similar in timing, so you could partially get there on that (although…look at divide, which IIRC the 801 didn’t even have). So they aren’t really uniform in length even before you consider memory issues.

Then there is memory — not all references can be handled before instruction decode because a computed indirect reference requires computation from the ALU and register state to perform. You can’t just send the following instruction to another unit as its computation may depend on the instruction you’re waiting for.

That’s why the instruction decode-and-perform has been smashed into lots of duplicated bits: to do what you say but on the individual steps.

At a different level of granularity, big.LITTLE is an approach that’s along your line of thinking. We tend to have a lot more computation than we need, even at the embedded level, these days and we may end up with some tiny power-sipping coprocessors that dominate most of the runtime of a device while doing almost nothing.


> a future dominated by gigantic core counts of really simple cores that may even be in-order

Chuck Moore (of Forth fame) has been advocating something like this for a while, although he's more "multi-computer" than "multi-core".

https://www.greenarraychips.com/index.html


> I'm betting on a future dominated by gigantic core counts of really simple cores that may even be in-order.

¿Por qué no los dos? ¡CPU and GPU!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: