No shared memory. To communicate between processes you usually use sockets, to communicate between threads you mutate variables. This is a huge performance difference.
A tangent but I find it amusing to contrast the perpetual Python GIL debate with all the new computation platforms that claim to be focused on scalability. Those are mostly single threaded or max out at a few virtual CPUs (eg "serverless" platforms) and there people applaud it. There people view the isolation as supporting scalability.
Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.
Is it truly worth it just to avoid some memory overhead? Or is there some other windows specific thing that I'm missing here?
> Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.
#2 need not be true; e.g., the approach proposed here is transparent to most Python code and even minimized impact on C extensions, still exposing the same GIL hook functions which C code would use in the same circumstances, though it has slightly different effect.
Well actually, on the types of CPUs that OP refers to (128 threads i.e. AMD Threadripper), L3 cache is only shared within each pair of CCXs that form a CCD. If you launch a program with 32 threads, they may have 1, 2, 3 or 4 distinct L3 caches to work with.
Moreover, unless thread pinning is enforced, a given thread will bounce around between different cores during execution, so the number of distinct L3 caches in action will not be constant.
Of course you have the same story with memory, accessing another thread's memory is slower if that thread is on another CCD.
TL;DR NUMA makes life hard if you want to get consistent performance from parallelism.
I mean is there anything here preventing one from only writing their code to be single threaded tho? This is an addition to the capability and not a detraction.
Say your webapp talks to a database or a cache. It'd be really nice if you could use a single connection to that database instead of 64 connections. Or if you wanted to cache some things on the web server, it would be nice if you could have 1 copy easily accessible vs needing 64 copies and needing to fill those caches 64x as much.
Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes. In practice each thread needs exclusive access to its own database connection while it is active. This is normally achieved using connection pooling which can save a few connections when some threads are idle, but 1 connection for 64 active web worker threads is not a recipe for a performant web app. If you can point to a multithreaded web app server that works this way I’d be very interested to hear about it.
The idea of a process-local cache (or other data) shared among all worker threads is a different story. Along with reduced memory consumption, I see this as one of the bigger advantages of threaded app servers. However, preforking multiprocess servers can always use shmget(2) to share memory directly with a bit more work.
> Unfortunately using a single db/RPC connection for many active threads is not done in any multithreaded system I’m aware of for good reasons. Sharing this type of resource across threads is not safe without expensive and performance-destroying mutexes
lol, you're so deep into python stockholm-syndrome "don't share anything between threads because we don't support that at all even a little bit" that you don't even realize that connection pools exist. Instead of holding a connection open per process, you can have one connection pool with 30 connections that services 200 threads (exact ratio depends on how many are actually using connections, of course). literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.
and yeah you can turn that into yet another standalone service that you gotta deliver in your docker-compose setup, but everybody else just builds that into the application itself.
> that you don't even realize that connection pools exist
The GP mentions connection pooling literally three sentences later.
> literally everybody "shares a single DB/RPC connection across multiple threads" (or at least shares a number of connections across a number of threads), except python.
Right, but multiple ≠ many. You're discussing the former. GP is discussing the latter.
Depending on the structure, it can indeed be many. Both in the case of protocols which support multiplexing of requests, and in situations where you have multiple databases (thus a given thread might not need to be talking to a particular database all the time).