Shifted indexed addressing is needed more seldom, but indexed addressing, i.e. r...

brucehoult · on Dec 2, 2021

A big difference here is that the RISC-V instructions are usually all 16 bits in size while the Aarch64 and POWER instructions are all 32 bits in size. So the code size is the same.

Also, high performance Aarch64 and POWER implementations are likely to be splitting those instructions into two decoupled uops in the back end.

Performance-critical loops are unrolled on all ISAs to minimise loop control overhead and also to allow scheduling instructions to allow for the several cycle latency of loads from even L1 cache. When you do that, indexed addressing and auto-update addressing are still doing both operations for every load or store which, as well as being a lot of operations, introduces sequential dependency between the instructions. The RISC-V way allows the use of simple load/store with offset -- all of which are independent of each other -- with one merged update of each pointer at the end of the loop. POWER and Aarch64 compilers for high performance microarchitectures use the RISC-V structure for unrolled loops anyway.

So indexed addressing and auto-update addressing give no advantage for code size, and don't help performance at the high end.