multiprocessing only works fine when you're working on problems that don't requi...

ebiester · on March 11, 2024

And I guess what I don't understand is why people choose Python for these use cases. I am not in the "Rustify" everything camp, but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.

KaiserPro · on March 11, 2024

> but Go + C, Java + JNI, Rust, and C++ all seem like more suitable solutions.

apart from go (maybe java) those are all "scary" languages that require a bunch of engineering to get to the point that you can prototype.

even then you can normally pybind the bits that are compute bound.

If Microsoft had been better back in the say, then c# should have been the goto language of choice. It has the best tradeoff of speed/handholding/rapid prototyping. Its also statically typed, unless you tell it to not be.

snovv_crash · on March 12, 2024

#pragma omp parallel for

gets you 90% of the potential performance of a full multithreaded producer/consumer setup in C++. C++ isn't as scary as it used to be.

oivey · on March 11, 2024

Notably, all of those are static languages and none of them have array types as nice as PyTorch or NumPy, among many other packages in the Python ecosystem. Those two facts are likely closely related.

abdullahkhalids · on March 11, 2024

Python is just the more popular language. Julia array manipulation is mostly better (better syntax, better integration, larger standard library) or as good as python. Julia is also dynamically typed. It is also faster than Python, except for the jit issues.

znpy · on March 11, 2024

> It is also faster than Python, except for the jit issues.

I was intrigued by Julia a while ago, but didn't have time to properly learn it.

So just out of curiosity: what's the issues with jit and Julia ?

cjalmeida · on March 11, 2024

The "issue" is Julia is not Just-in-Time, but a "Just-Ahead-of-Time" language. This means code is compiled before getting executed, and this can get expensive for interactive use.

The famous "Time To First Plot" problem was about taking several minutes to do something like `using Plots; Plots.plot(sin)`.

But to be fair recent Julia releases improved a lot of it, the code above in Julia 1.10 takes 1.5s on my 3-year old laptop

jakobnissen · on March 11, 2024

Julia's JIT compiles code when its first executed, so Julia has a noticable delay from you start the program and until it starts running. This is anywhere from a few hundred milliseconds for small scripts, to tens of seconds or even minutes for large packages.

shiroiushi · on March 12, 2024

I wonder why they don't just have an optional pre-compilation, so once you have a version you're happy with and want to run in production, you just have a fully compiled version of the code that you run.

aoanla · on March 12, 2024

Effectively, it does - one of the things recent releases of Julia have done is to add more precompilation caching on package install. Julia 1.10 feels considerably snappier than 1.0 as a result - that "first time to plot" is now only a couple of seconds thanks to this (and subsequent plots are, of course, much faster than that).

oivey · on March 11, 2024

Preaching to the choir here.

Julia’s threading API is really nice. One deficiency is that it can be tricky to maintain type stability across tasks / fetches.

samatman · on March 11, 2024

If only there were a dynamic language which performs comparably to C and Fortran, and was specifically designed to have excellent array processing facilities.

Unfortunately, the closest thing we have to that is Julia, which fails to meet none of the requirements. Alas.

rmbyrro · on March 11, 2024

If only there was a car that could fly, but was still as easy and cheap to buy and maintain :D

esafak · on March 11, 2024

Why do people use python for anything beyond glue code? Because it took off, and machine learning and data science now rely on it.

I think Python is a terrible language that exemplifies the maxim "worse is better".

https://en.wikipedia.org/wiki/Worse_is_better

nottorp · on March 11, 2024

To quote from Eric Raymond's article about python, ages ago:

"My second [surprise] came a couple of hours into the project, when I noticed (allowing for pauses needed to look up new features in Programming Python) I was generating working code nearly as fast as I could type.

When you're writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you've achieved mastery of the language. But that didn't make sense, because it was still day one and I was regularly pausing to look up new language and library features!"

Source: https://www.linuxjournal.com/article/3882

It doesn't go for large code bases, but if you need quick results using existing well tested libraries, like in machine learning and data science, I think those statements are still valid.

Obviously not when you're multiprocessing, that is going to bite you in any language.

rmbyrro · on March 11, 2024

Some speculate that universities adopted it as introductory language for its expressiveness and flat learning curve. Scientific / research projects in those unis started picking Python, since all students already knew it. And now we're here

spprashant · on March 11, 2024

I have no idea if this is verifiably true in a broad sense, but I work at the university and this is definitely the case. PhD students are predominantly using Python to develop models across domains - transportation, finance, social sciences etc. They then transition to industry, continuing to use Python for prototyping.

zamadatix · on March 11, 2024

People choose Python the use case, regardless what that is, because it's quick and easy to work with. When Python can't realistically be extended to a use case then it's lamented, when it can it's celebrated. Even Go, while probably the friendliest of that buch when it comes to parallel work, is on a different level.

pillusmany · on March 11, 2024

"Ray" can share python objects memory between processes. It's also much easier to use than multi processing.

ptx · on March 11, 2024

How does that work? I'm not familiar with Ray, but I'm assuming you might be referring to actors [1]? Isn't that basically the same idea as multiprocessing's Managers [2], which also allow client processes to manipulate a remote object through message-passing? (See also DCOM.)

[1] https://docs.ray.io/en/latest/ray-core/walkthrough.html#call...

[2] https://docs.python.org/3/library/multiprocessing.html#manag...

pillusmany · on March 11, 2024

Shared memory:

https://docs.ray.io/en/latest/ray-core/objects.html

ptx · on March 11, 2024

According to the docs, those shared memory objects have significant limitations: they are immutable and only support numpy arrays (or must be deserialized).

Sharing arrays of numbers is supported in multiprocessing as well: https://docs.python.org/3/library/multiprocessing.html#shari...

jononor · on March 12, 2024

I think that 90 or maybe even 99% of cases has under 1GB of memory per process? At least it has been the case for me the last 15 years.

Of course, getting threads to be actually useful for concurrency (GIL removed) adds another very useful tool to the performance toolkit, so that is great.