Hacker Newsnew | past | comments | ask | show | jobs | submit | yaantc's commentslogin

I'm sorry I won't share much details, I don't think much is public on Vsora architecture and don't want to breach any NDA...

From their web page Euclyd is a "many small cores" accelerator. Doing good compilation toolchains for these to get efficient results is a hard problem, see many comments on compilers for AI in this thread.

Vsora approach is much more macroscopic, and differentiated. By this I mean I don't know anything quite like it. No sea of small cores, but several more beefy units. They're programmable, but don't look like a CPU: the HW/SW interface is at a higher level. A very hand-wavy analogy with storage would be block devices vs object storage, maybe. I'm sure more details will surface when real HW arrive.


Very simplified, AI workloads need compute and communications and compute dominates inference, while communications dominate training.

Most start-ups innovate on the compute side, whereas the techno needed for state of the art communications is not common, and very low-level: plenty of analog concerns. The domain is dominated by NVidia and Broadcom today.

This is why digital start-ups tend to focus on inference. They innovate on the pure digital part, which is compute, and tend to use off-the-shelf IPs for communications, so not a differentiator and likely below the leaders.

But in most cases coupling a computation engine marketed for inference with state of the art communications would (in theory) open the way for training too. It's just that doing both together is a very high barrier. It's more practical to start with compute, and if successful there use this to improve the comms part in a second stage. All the more because everyone expects inference to be the biggest market too. So AI start-ups focus on inference first.


They also have the 'tyr 4' [1].

It doesn't have to compete on price 1:1. Ever since Trump took office, the Europeans woke up on their dependence on USA who they no longer regard as a reliable partner. This counts for defense industry, but also for critical infrastructure, including IT. The European alternatives are expected to cost something.

[1] https://vsora.com/products/tyr/


From Le Monde live feed, RTE (French electricity network manager) declared the issue unrelated to this fire.

"Le gestionnaire français souligne par ailleurs que cette panne n’est pas due à un incendie dans le sud de la France, entre Narbonne et Perpignan, contrairement à des informations qui circulent."


castxml (https://github.com/CastXML/CastXML) may be what you want. It uses the Clang front-end to output an XML representation of a C or C++ parse tree. It is then possible to turn this into what you want. I've used it and seen it used to generate code to do endianess conversion of structures from headers, or RPC code generation for example.

It can be used from Python through pygccxml (https://github.com/CastXML/pygccxml). The name comes from a previous instance, gccxml, based on the GCC front-end.

Both castxml and pygccxml are packaged in Debian and Ubuntu.


On the L/S unit impact: data movement is expensive, computation is cheap (relatively).

In "Computer Architecture, A Quantitative Approach" there are numbers for the now old TSMC 45nm process: A 32 bits FP multiplication takes 3.7 pJ, and a 32 bits SRAM read from an 8 kB SRAM takes 5 pJ. This is a basic SRAM, not a cache with its tag comparison and LRU logic (more expansive).

Then I have some 2015 numbers for Intel 22nm process, old too. A 64 bits FP multiplication takes 6.4 pJ, a 64 bits read/write from a small 8 kB SRAM 4.2 pJ, and from a larger 256 kB SRAM 16.7 pJ. Basic SRAM here too, not a more expansive cache.

The cost of a multiplication is quadratic, and it should be more linear for access, so the computation cost in the second example is much heavier (compare the mantissa sizes, that's what is multiplied).

The trend gets even worse with more advanced processes. Data movement is usually what matters the most now, expect for workloads with very high arithmetic intensity where computation will dominate (in practice: large enough matrix multiplications).


Appreciate the detail! That explains a lot of what is going on.. It also dovetails with some interesting facts I remember reading about the relative power consumption for the zen cores versus the infinity fabric connecting them - The percentage of package power usage simply from running the fabric interconnect was shocking.


Right, but a SIMD single precision mul is linear (or even sub linear) relative to it's scalar cousin. So a 16x32, 512-bit MUL won't be even 16x the cost of a scalar mul, the decoder has to do only the same amount of work for example.


The calculations within each unit may be, true, but routing and data transfer is probably the biggest limiting factor on a modern chip. It should be clear that placing 16x units of non-trivial size means that the average will likely be further away from the data source than a single unit, and transmitting data over distances can have greater-than-linear increasing costs (not just resistance/capacitance losses, but to hit timing targets you need faster switching, which means higher voltages etc.)


Both Intel and AMD to some extent separate the vector ALUs and the register file in 128-bit (or 256-bit?) lanes, across which arithmetic ops won't need to cross at all. Of course loads/stores/shuffles still need to though, making this point somewhat moot.


AFAIK you have to think about how many different 512b paths are being driven when this happens, like each cycle in the steady-state case is simultaneously (in the case where you can do two vfmadd132ps per cycle):

- Capturing 2x512b from the L1D cache

- Sending 2x512b to the vector register file

- Capturing 4x512b values from the vector register file

- Actually multiplying 4x512b values

- Sending 2x512b results to the vector register file

.. and probably more?? That's already like 14*512 wires [switching constantly at 5Ghz!!], and there are probably even more intermediate stages?


… per core. There are eight per compute tile!

I like to ask IT people a trick question: how many numbers can a modern CPU multiply in the time it takes light to cross a room?


Piggy backing on this: memory scaling was slowter than compute scaling, at least since 45nm in the example. For 4nm the difference is larger.


Random logic had also much better area scaling than SRAM since EUV which implies that gap continues to widen at a faster rate.



LTE total latency is 20-50 ms, and you compare this to the marketing "air link only" 5G latency of 1 ms. It's apple and oranges ;)

FYI, the air link latency for LTE was given as 4-5 ms. FDD as it's the best here. The 5G improvement to 1ms would require features (URLLC) that nobody implemented and nobody will: too expensive for too niche markets.

The latency in a cellular network is mostly from the core network, not the radio link anymore. Event in 4G.

(telecom engineer, having worked on both 4G and 5G and recently out of the field)


Always been interested in this stuff. Where would you recommend a software/math guy learn all this stuff? My end goal is to understand the tech well enough to at least have opinions on it. How wifi works would be great as well if you're aware of any resources for that.


It's a good but hard question... Because cellular is huge.

In a professional context, nobody knows it all in details. There are specializations: core network and RAN, and inside RAN protocol stack vs PHY, and in PHY algos vs implementation, etc.

You can see all the cellular specs (they're public) from there: https://www.3gpp.org/specifications-technologies/specificati...

5G (or NR) is the series 38 at the bottom. Direct access: https://www.3gpp.org/ftp/Specs/archive/38_series

It's a lot ;) But a readable introduction is the 38.300 spec, and the latest edition for the first 5G release (R15, or "f") is this one: https://www.3gpp.org/ftp/Specs/archive/38_series/38.300/3830...

It's about as readable as it can get. The PHY part is pretty awful by comparison. If you have a PHY interest, you'll need to look for technical books as the specs are quite hermetic (but it's not my field either).


Forgot to get back to you on these.. thanks for the links!


Emacs is in the process of moving from legacy languages modes using regexps and elisp for syntax analysis to new modes using tree sitter.

In this context, what does a name like "c-mode" should mean? Options: 1) it should stick to the old mode, cc-mode here. To use the new mode, use explicitly c-ts-mode; 2) it should move to the new tree sitter mode, c-ts-mode. To use the old mode, use explicitly cc-mode; 3) it should mean the new preferred Emacs mode, with a way for the user to take back control if they have a different preference. This preferred mode will change at some point from legacy to tree sitter.

The change is (3), with a move to tree sitter in Emacs 30 (to be released soon) IIUC. It makes sense to me. Saying that anyone own a name as generic as "c-mode" in an open source project just because they're first and have a long history as a contributor (thanks by the way!) seems excessive. Change of default is normal in an evolving project, and as long as it's clearly documented with a way to override (which is the case IIUC) it's fine to me. One can dislike the change, but it's impossible to please everyone anyway. Emacs users are used to adjust configuration based on their preferences.

I understand it can be an emotional situation for the maintainer of the legacy mode. But I don't see the need to call foul play.


I agree with giving users control, but unfortunately I cannot agree with the move to c-ts-mode. And I cannot disagree more with associating CC mode with "legacy" when it's objectively better than the other alternative, at least currently. I don't think Emacs developers are doing users a favor in this specific case.

CC Mode is extremely capable. Over the years it has developed to such a maturity that almost all needs can be satisfied, and performance has never been a problem for me. It contains very few, if any, bugs, that affect my use.

On the other hand, the tree-sitter major modes are not at al production-ready to be considered as default. For one thing, the whole highlighting can break for complex macros and ifdefs. (I'd be glad to be enlightened whether it's theoretically possible to fix at all -- can you correctly highlight ifdefs without doing semantic analysis with the help of a compiler?) For another, CC mode has a feature called c-guess that can quickly analyze an existing source buffer and generate a format definition which proves extremely valuable. Alas, c-ts-mode has zero support for it.

I had high hopes for tree-sitter. I turned on tree-sitter modes for all my coding when it was out, and now I have zero enabled. They still have a long way to go and I don't want to spend time debugging emacs code at work. :-)

Tree-sitter is not a panacea. Fast parsing alone is not what makes a good major mode.


As someone whose pronouns are C-programmer/vim, I feel unsafe.

My living nightmare would be to develop highly verbose Java programs in an editor with 999 gorillion different "modes" with seemingly random names.

"oh, you're making an singletonfactoryfacade in Treesat-19 mode, you'll need to be using CCC-mode, treesat-19 mode is for factoryencapsulationfactory patterns"


And indeed, if anything, a project like emacs being unable to make a decision like this results in a project that slowly dies from the weight of its own history.

Tree-sitter is fairly universally understood now to be "the future." While cc-mode will likely have its place for a long time (hard to beat regexes on speed, even if they break down when the input is too noisy), moving the default to the tree-sitter implementation aligns with the other language modes going to tree-sitter. For good or ill, consistency is almost certainly better than new users having to learn "Your code is parsed by tree-sitter. Oh, except your C and C++ code, unless you set this flag, because Mackenzie threw a fit in 2024. That's a fun bit of history you get to care about forever now as a user!"


Hi, in case you're not already aware of the name clash, there's already a `rr` in the programming world. It's "record and replay": https://rr-project.org/.

Very different, but a very fine tool tool too.


It doesn’t seem like the rr that GP linked to is their own project, just something they’ve found useful.

In any case, in the non-software world, “RR” stands for railroad, as it does in the name of that tool. You can’t own a common two-letter abbreviation.


See just above the map: "This has been age-standardized, assuming a constant age structure of the population for comparisons between countries and over time.". This is what you suggests IIUC?


Yes and i apologise for not seeing this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: