Hacker Newsnew | past | comments | ask | show | jobs | submit | yvdriess's commentslogin

Mandatory mention of notable actor languages:

  - Erlang and Elexir
  - E
  - AmbientTalk

And they could be 0- or 1- indexed? :P


TSO? Did you mean tso.architecture.cpu?


Either that or https://www.ibm.com/docs/en/zos-basic-skills?topic=interface...

I would prefer we didn’t have so many collisions in the TLA space.


The last company I worked at created TLAs for everything, or at least engineering always did. New company doesn't seem to have caught that bug yet though, thankfully.


> > Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

> That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

That's not refuting the point he's making. The mainframe-on-chip iAPX family (and Itanium after) died and had no heirs. The current popular CPU families are all descendents of the stopgap 8086 evolved from the tiny computer CPUs or ARM's straight up embedded CPU designs.

But I do agree with your point that a flat (global) virtual memory space is a lot nicer to program. In practice we've been fast moving away from that again though, the kernel has to struggle to keep up the illusion: NUCA, NUMA, CXL.mem, various mapped accelerator memories, etc.

Regarding the iAPX 432, I do want to set the record straight as I think you are insinuating that it failed because of its object memory design. The iAPX failed mostly because of it's abject performance characteristics, but that was in retrospect [1] not inherent to the object directory design. It lacked very simple look ahead mechanisms, no instruction or data caches, no registers and not even immediates. Performance did not seemed to be a top priority in the design, to paraphrase an architect. Additionally, the compiler team was not aligned and failed to deliver on time, which only compounded the performance problem.

  - [1] https://dl.acm.org/doi/10.1145/45059.214411


The way you selectively quoted: yes, you removed the refutation.

And regarding the iAPX 432: it was slow in large part due to the failed object-capability model. For one, the model required multiple expensive lookups per instruction. And it required tremendous numbers of transistors, so many that despite forcing a (slow) multi-chip design there still wasn't enough transistor budget left over for performance enhancing features.

Performance enhancing features that contemporary designs with smaller transistor budgets but no object-capability model did have.

Opportunity costs matter.


I agree on the part of the opportunity cost and that given the transistor budgets of the time a simpler design would have served better.

I fundamentally disagree on putting the majority of the blame on the object memory model. The problem was that they were compounding the added complexity of the object model with a slew of other unnecessary complexities. They somehow did find the budget to put the first full IEEE floating point unit on the execution unit, implemented a massive[1] decoder and microcode for the bit-aligned 200+ instruction set and interprocess communication. The expensive lookups per instructions had everything to do with cutting caches and programmable registers, not any kind of overwhelming complexity to the address translation.

I strongly recommend checking the "Performance effects of architectural complexity in the Intel 432" paper by Colwell that I linked in the parent.

[1] die shots: https://oldbytes.space/@kenshirriff/110231910098167742


I huge factor in iAPX432 utter lack of success, were technological restrictions, like pin-count limits, laid down by Intel Top Brass, which forced stupid and silly limitations on the implementation.

That's not to say that iAPX432 would have succeeded under better management, but only to say that you cannot point to some random part of the design and say "That obviously does not work"


Your critique applies to measuring one or a handful of instructions. In practice you count the number of cycles over million or billion instructions. CPI is very meaningful and it is the main throughput performance metric for CPU core architects.


Register moves do not really play a factor in performance, unless its to move to/from vector registers.


H+P says register allocation is one of the most important—if not the most important—optimizations.


In cpu uarch design, sure, but that's outside the context of the discussion. There's nothing you can do to that C++ library you are optimizing that will impact performance due to register allocation/renaming.


This is not always true. Compilers are quite good at register allocation but sometimes they get it wrong and sometimes you can make small changes to code that improve register allocation and thus performance.

Usually the problem is an unfortunately placed spill, so the operation is actually l1d$ traffic, but still.


> l1d$

I don't know how to interpret this.


Level 1 data cache


Yes and just like our bodies, that closed loop is cooled by a rack of evaporators on the roof.


He was an animator and it's the interesting kind of factoids we read these comment sections for. We'll allow it.


You missed the pun


It's the overhead cost caused by trust breakdown. (tbf sometimes the timesheets are there for legal/tax reasons)


It doesn't need to prove that. It needs to produce plausible data that appeases either your direct or +1 manager.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: