Linear Exeuction, Multiprocessing, and Multithreading IO-Bound Tasks in Python

ebg13 · on Jan 25, 2020

Whenever I see a blog post on Python concurrency that recommends something other than concurrent.futures ThreadPoolExecutor or ProcessPoolExecutor, I shake my head in disappointment at how far we've strayed from "one obviously right way". Why in 2020 am I seeing someone manually close and join a worker pool (blocking!) to return all results all at once because they rewrote executor.map using multiprocessing? Why do they del pool immediately before returning instead of just letting it run out of scope? Why are they using map, which orders the result, if they just want to call all? If performance matters, you want to read the results in the order of completion not the order of submission, and you don't want to wait for all of them to complete if any of them fail. Where did Python's "right way" go sideways?

chooseaname · on Jan 25, 2020

If I google for python concurrency the second link is to the docs: https://docs.python.org/3/library/concurrency.html

I don’t see where in the docs it says, it’s 2020 so do this from now on. Maybe that’s the problem?

ebg13 · on Jan 25, 2020

> I don’t see where in the docs it says, it’s 2020 so do this from now on. Maybe that’s the problem?

I guess lack of clear direction is exactly the problem. For all of PEP20's supposed importance, it's notoriously difficult to discover which parts of the standard library to use and which ones to ignore unless you like reinventing wheels.

Brotkrumen · on Jan 25, 2020

>First, terms. Most programs work from top to bottom. The next line runs after the last one finishes. We call these linear

Sequential is the term you're looking for.

zacjszewczyk · on Jan 25, 2020

Linear felt right when I was writing it, for some reason, but you're right. Fixed.

jwandborg · on Jan 25, 2020

I'm 99% sure threads will run on all cores. I think the author has confused multithreading with hyperthreading.

jdnier · on Jan 25, 2020

I was never sure about that, thinking that a single interpreter process runs on one core and the threads must share that process space. But you're right, and I found good confirmation in this visualization (see the "code" link also): http://www.dabeaz.com/GIL/gilvis/fourthread.html

See also http://www.dabeaz.com/python/UnderstandingGIL.pdf, which explains that

    • Python threads are real system threads
    • POSIX threads (pthreads)
    • Windows threads
    • Fully managed by the host operating system
    • Represent threaded execution of the Python interpreter process (written in C)

_joe · on Jan 25, 2020

The article is full of inaccuracies and errors. Confusing threads with eventloops is just one of them.

I can't imagine it's being upvoted.