Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It isn't just detecting the extra logical processors, you have to do work to utilise them. From the linked text:

"If there are more than one processor group in Windows (on systems with more than 64 cpu threads), 7-Zip distributes running CPU threads across different processor groups."

The OS does not do that for you under Windows. Other OSs handle that many cores differently.

> more than 15 years ago, they probably could've found a moment to add and test a conditional call to it

I suspect it hasn't been an issue much at all until recently. Any single block of data worth spinning up that many threads for compressing is going to be very large, you don't want to split something into too small chunks for compression or you lose some benefit of the dynamic compression dictionary (sharing that between threads would add a lot of inter-thread coordination work, killing any performance gain even if the threads are running local enough on the CPU to share cache). Compression is not an inherently parallelizable task, at least not “embarrassingly” so like some processes.

Even when you do have something to compress that would benefit for more than 64 separate tasks in theory, unless it is all in RAM (or on an incredibly quick & low latency drive/array) the process is likely to be IO starved long before it is compute starved, when you have that much compute resource to hand.

Recent improvements in storage options & CPUs (and the bandwidth between them) have presumably pushed the occurrences of this being worthwhile (outside of artificial tests) from “practically zero” to “near zero, but it happens”, hence the change has been made.

Note that two or more 7-zip instances working on different data could always use more than 64 threads between them, if enough cores to make that useful were available.



Are you sure that if you don't attempt to set any affinities, Windows won't schedule 64+ threads over other processor groups? I don't have any system handy that'll produce more than 64 logical processors to test this, but I'd be surprised if Windows' scheduler won't distribute a process's threads over other processor groups if you exceed the number of cores in the group it launches into.

The referenced text suggests applications will "work", but that isn't really explicit.


They're either wrong or thinking about windows 7/8/10. That page is quite clear.

> starting with Windows 11 and Windows Server 2022 the OS has changed to make processes and their threads span all processors in the system, across all processor groups, by default.

> Each process is assigned a primary group at creation, and by default all of its threads' primary group is the same. Each thread's ideal processor is in the thread's primary group, so threads will preferentially be scheduled to processors on their primary group, but they are able to be scheduled to processors on any other group.


I mean, it seems it's quite clear that a single process and all of its threads will just be assigned to a single processor group, and it'll take manual work for that process to use more than 64 cores.

The difference is just that processes will be assigned a processor group more or less randomly by default, so they'll be balanced on the process level, but not the thread level. Not super helpful for a lot of software systems on windows which had historically preferred threads to processes for concurrency.


https://learn.microsoft.com/en-us/windows/win32/procthread/p...

This explicitly says the feature is automatic and programs will not need to manually adjust their affinity.


> it'll take manual work for that process to use more than 64 cores.

No it won't.


It absolutely will. Your process is only assigned a single processor group at process creation time. The only difference now is that it's by default assigned a random processor group rather than inheriting the parent's. For processes that don't require >64 cores, this means better utilization at the system level. However you're still assigned <=64 cores by default per process by default.

That's literally why 7-zip is announcing completion of that manual work.


The 7zip code needed to change because it was counting cores by looking at affinity masks, and that limits it to 64.

It also needed to change if you want optimal scheduling, and it needed to change if you want it to be able to use all those cores on something that isn't windows 11.

But for just the basic functionality of using all the cores: >Starting with Windows 11 and Windows Server 2022, on a system with more than 64 processors, process and thread affinities span all processors in the system, across all processor groups, by default

That's documentation for a single process messing with its affinity. They're not writing that because they wrote a function to put different processes on different groups. A single process will span groups by default.


That depends on what format you're using. Zip compresses every file separately. Bzip and zstd have pretty small maximum block sizes and gzip doesn't gain much from large blocks anyway. And even when you're making large blocks, you can dump a lot of parallelism into searching for repeat data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: