Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As an outsider to python, I never got how a language who got popular for being simple, elegant and readable could end up with perhaps the most complex tooling situation (dependencies, envs, etc). Any time I glance at the community there seems to be a new way of doing things.

What caused python to go through these issues? Is there any fundamental design flaw ?



It's mostly about age. Python has been around for 35 years now. The first version of a Python package directory was the cheeseshop (Monthy Python reference) in 2003. The earliest version of a pip-like tool was "easy_install" which - I kid you not - worked by scraping the HTML listing page of the cheeseshop and downloading zip files linked from that!

More recent languages like Node.js and Rust and Go all got to create their packaging ecosystems learning from the experiences of Perl and Python before them.

There is one part of Python that I consider a design flaw when it comes to packaging: the sys.modules global dictionary means it's not at all easy in Python to install two versions of the same package at the same time. This makes it really tricky if you have dependency A and dependency B both of which themselves require different versions of dependency C.


I think it's also from trying to keep with the old paradigm of "libraries are installed and managed globally, potentially as linkable object files."

All the languages of today gain all their improvements from:

1. Nothing should be global, but if it is it's only a cache (and caches are safe to delete since they're only used as a performance optimization)

2. You have to have extremely explicit artifact versioning, which means everything needs checksums, which means mostly reproducible builds

3. The "blessed way" is to distribute the source (or a mostly-source dist) and compile things in; the happy path is not distributing pre-computed binaries

Now, everything I just said above is also wrong in many aspects or there's support for breaking any and all of the rules I just outlined, but in general, everything's built to adhere to those 3 rules nowadays. And what's crazy is that for many decades, those three rules above were considered absolutely impossible, or anti-patterns, or annoying, or a waste, etc (not without reason, but still we couldn't do it). That's what made package managers and package management so awful. That's why it was even possible to break things with `sudo pip install` vs `apt install`.

Now that we've abandoned the old ways in e.g. JS/Rust/Go and adopted the three rules, all kinds of delightful side effects fall out. Tools now which re-build a full dependency tree on-disk in the project directory are the norm (it's done automatically! No annoying bits! No special flags! No manual venv!). Getting serious about checksums for artifacts means we can do proper versioning, which means we can do aggressive caching of dependencies across different projects safely, which means we don't have to _actually_ have 20 copies of every dependency, one for each repo. It all comes from the slow distributed Gentoo/FreeBSD-ification of everything and it's great!


If, and only if, you have actual reproducible builds, you can distribute pre-compiled binaries as a cache optimization. That can allow for speedups without necessarily compromising security. It's also a prerequisite for a lot of "supply chain" security processes which are becoming increasingly desirable.


On a tangent, the somewhat related issue of Python 3 not being able to import Python 2 packages famously led Zed Shaw of "Learn Python the Hard Way" to write a rant about how Python is not Turing Complete. I checked again and apparently he removed that rant and only has a disclaimer in its place mentioning that he was obviously being hyperbolic [0].

[0] https://learnpythonthehardway.org/book/nopython3.html#the-py...


I don’t think anyone takes Zed Shaw seriously.

He’s like that uncle you see at family gatherings whom you nod along politely to.


Zed Shaw seems to have some very interesting beliefs about the 2->3 migration in general. I think it's fair to call some of it conspiratorial.


Indeed that was a weird time, but he did eventually relent and release a version for Python 3 - https://learncodethehardway.com/client/#/product/learn-pytho...


> There is one part of Python that I consider a design flaw when it comes to packaging: the sys.modules global dictionary means it's not at all easy in Python to install two versions of the same package at the same time. This makes it really tricky if you have dependency A and dependency B both of which themselves require different versions of dependency C.

But it solves the problem that if A and B both depend on C the user can pass an object from A to B that was created by C without worrying about it breaking.

In less abstract terms, let's say numpy one day changed it's internal representation of an array, so if one version of numpy read an array of a different version of numpy it would crash or worse read it but misinterpret it. Now if I have one data science library produces numpy arrays and another visualization library that takes numpy arrays, I can be confident that only one version of numpy is installed and the visualization library isn't going to misinterpret the data from the data because it is using a different version of numpy.

This stability of installed versions have allowed entire ecosystems build around core dependencies in a way that would be tricky without that. I would therefore not consider it a design flaw.


I wouldn't mind a codebase where numpy objects created by dependency B can't be shared directly with dependency A without me first running some kind of conversion function on them - I'd take that over "sorry you want to use dependency A and dependency B in this project, you're just out of luck".


> I wouldn't mind a codebase where numpy objects created by dependency B can't be shared directly with dependency A without me first running some kind of conversion function on them

Given there's no compiler to enforce this check, and Python is dynamic language, I don't see how you implement that without some complicated object provenance feature, making every single object larger and every use of that object (calling with it, calling it, assigning it to an attribute, assigning an attribute to it) impose an expensive runtime check.

But maybe I'm missing something obvious.


You let people make the mistake and have the library throw an exception if they do that, not through type checking but just through something eventually calling a method that doesn't exist.


> You let people make the mistake and have the library throw an exception if they do that, not through type checking but just through something eventually calling a method that doesn't exist.

Exceptions or crashes would be annoying, but yes, are manageable, although try telling that to new users of the language that their code doesn't work because they didn't understand the transitive dependency tree of their install and it automatically vendored different versions of a library for different dependencies, and how did they not know that from some random exception occurring in a dependency.

But as I explain in my example, the real problem is that one version of the library reads the data in a different layout from the other, so instead you end of with subtle data errors. Now your code is working but your getting the wrong output, good luck debugging that.


Man, there was a window there where I still fell back to easy_install on Windows because it would handle C based stuff more reliably until wheels got invented. It’s been a journey.


35 years is misleading. Python existed, yes, but was very different. e.g. Pandas was released in 2008. Most use packages much more recent than that. 35 years ago Perl was faster than Python and had deep adoption (through 2007? or so)


How is it misleading?

The question is why Python packaging has such a complicated history. The age of the language is entirely relevant to that - the reason Go and Rust have it so good here is that they are much younger, coming out after may of the initial packaging lessons had been learned elsewhere.


it is misleading because I was around 35 years ago, and very few people were using Python. Python did not become very popular until web frameworks and pandas became a thing in python.


if you doubt this, ask any llm "what was the first year where Python users surpassed the number of Perl users?"


I still don't understand why that makes what I wrote "misleading". I never said Python was popular 35 years, I just said that the age of the language was relevant to understanding why the packaging history is complex.


Python 2.0 was released in October 2000. The Python ecosystem has witnessed several significant shifts in expectation as far as how software is built and delivered, from Slackware-style source builds to vendor packages to containers to uv just downloading a standalone binary archive. And the deadsnakes ppa and venvs, plus the ongoing awkwardness about whether pip should be writing stuff into usr/local or ~/.local or something else.

All of this alongside the rise of GitHub and free CI builders, it being trivial to depend on lots of other packages of unknown provenance, stdlib packages being completely sidelined by stuff like requests.

It’s really only in the last ten years or so that there’s been the clarity of what is a build backend vs frontend, what a lock file is and how workspace management fits into the whole picture. Distutils and setuptools are in there too.

Basically, Python’s packaging has been a mess for a long time, but uv getting almost everything right all of a sudden isn’t an accident; it’s an abrupt gelling of ideas that have been in progress for two decades.


> the deadsnakes ppa

Please don't use this. You need to be careful about how you place any secondary installation of Python on Ubuntu. Meanwhile, it's easy to build from source on Ubuntu and you can easily control its destination this way (by setting a prefix when you ./configure, and using make altinstall) and keep it out of Apt's way.

> and venvs, plus the ongoing awkwardness about whether pip should be writing stuff into usr/local or ~/.local or something else.

There is not really anything like this. You just use venvs now, which should have already been the rule since 3.3. If you need to put the package in the system environment, use an Apt package for that. If there isn't an Apt package for what you want, it shouldn't live in the system environment and also shouldn't live in your "user" site-packages — because that can still cause problems for system tools written in Python, including Apt.

You only need to think about venvs as the destination, and venvs are easy to understand (and are also fundamental to how uv works). Start with https://chriswarrick.com/blog/2018/09/04/python-virtual-envi... .

> It’s really only in the last ten years or so that there’s been the clarity of what is a build backend vs frontend

Well no; it's in that time that the idea of separating a backend and frontend emerged. Before that, it was assumed that Setuptools could just do everything. But it really couldn't, and it also led to people distributing source packages for pure-Python projects, resulting in installation doing a ton of ultimately useless work. And now that Setuptools is supposed to be focused on providing a build backend, it's mostly dead code in that workflow, but they still can't get rid of it for backwards compatibility reasons.

(Incidentally, uv's provided backend only supports pure Python — they're currently recommending heavyweight tools like maturin and scikit-build-core if you need to compile something. Although in principle you can use Setuptools if you want.)


> Meanwhile, it's easy to build from source on Ubuntu and you can easily control its destination this way

word of warning: I spent a lot of years working off of "built from source" Python on Ubuntu and every once in a while I'd have really awkward issues downstream of me not realizing I was missing some lib when I built Python and then some random standard library was just missing for me.

I think it's all generally good, but real easy to miss optional package stuff.


> You just … now

Yes, the point of my post wasn’t to give current best practice counsel but rather to illustrate how much that counsel has changed over the years as the needs and desires of the maintainers, distro people, developers, and broader community have evolved.


If you read the initial bbs post by Guido introducing Python he describes it mostly as an alternative to bash. Basically a really nice scripting language with a decent standard library. I don’t think it was designed from the start to end up where it has. He created a genius syntax that people love.


1. Age; there are absurd amounts of legacy cruft. Every time you have a better idea about how to do things, you have to agonize over whether you'll be allowed to remove the old way. And then using the old ways ends up indirectly causing problems for people using the new ways.

2. There is tons of code in the Python ecosystem not written in Python. One of the most popular packages, NumPy, depends on dozens of megabytes of statically compiled C and Fortran code.

3. Age again; things were designed in an era before the modern conception of a "software ecosystem", so there was nobody imagining that one day you'd be automatically fetching all the transitive dependencies and trying to build them locally, perhaps using build systems that you'd also fetch automatically.

4. GvR didn't seem to appreciate the problem fully in the early 2010s, which is where Conda came from.

5. Age again. Old designs overlooked some security issues and bootstrapping issues (this ties into all the previous points); in particular, it was (and still is) accepted that because you can include code in any language and all sorts of weird build processes, the "build the package locally" machinery needs to run arbitrary code. But that same system was then considered acceptable for pure-Python packages for many years, and the arbitrary code was even used to define metadata. And in that code, you were expected to be able to use some functionality provided by a build system written in Python, e.g. in order to locate and operate a compiler. Which then caused bootstrapping problems, because you couldn't assume that your users had a compatible version of the main build system (Setuptools) installed, and it had to be installed in the same environment as the target for package installation. So you also didn't get build isolation, etc. It was a giant mess.

5a. So they invented a system (using pyproject.toml) that would address all those problems, and also allow for competition from other build back-ends. But the other build back-end authors mostly wanted to make all-in-one tools (like Poetry, and now, er, uv); and meanwhile it was important to keep compatibility, so a bunch of defaults were chosen that enabled legacy behaviour — and ended up giving old packages little to no reason to fix anything. Oh, and also they released the specification for the "choose the build back-end system" and "here's how installers and build back-ends communicate" years before the specification for "human-friendly input for the package metadata system".


Dependency management has always felt complicated. However, environment management I think is actually way simpler than people realize. Python basically just walks up directories trying to find its packages dir. A python "env" is just a copy of the python binary in its own directory. That's pretty much it. Basically all difficulties I've ever had with Python environments have been straightened out by going back to that basic understanding. I feel like the narrative about virtualenvs has always seemed scary but the reality really isn't.


It was an intentional design decision to separate package installation and management. I think that created the mess we have now.

Funny thing is that decision was for modularity, but uv didn't even reuse pip.


> Funny thing is that decision was for modularity, but uv didn't even reuse pip.

To be fair, that's justified by pip's overall lack of good design. Which in turn is justified by its long, organic development (I'm not trying to slight the maintainers here).

But I'm making modular pieces that I hope will showcase the original idea properly. Starting with an installer, PAPER, and build backend, bbbb. These work together with `build` and `twine` (already provided by PyPA) to do the important core tasks of packaging and distribution. I'm not trying to make a "project manager", but I do plan to support PEP 751 lockfiles.


Python is not simple, it's a very complex language hiding behind friendly syntax.

Given that, plus the breadth and complexity of its ecosystem, it makes sense that its tooling is also complex.


BDFL left a long time ago. It’s not opinionated anymore. The language went from being small enough to fit in that guy’s head to a language controlled by committee that’s trying to please everyone.


poor answer. guido had very little impact on the packaging mess.


Many would say that's the problem; i.e. that he should have had more impact. Check out the history of Conda.


right but "BDFL left" is clearly the wrong thing to blame when "BDFL never cared enough" so it doesnt matter if he left


That’s right. And we switched from eggs to wheel’s on Guido’s watch, but that was from him being a good leader and letting other smart people do clever things on their own.


Seems like the flaw is that it was never a first class citizen of the language.

easy_install never even made it to 1.0

Still, not bad for a bunch of mostly unpaid volunteers.


I call it the JS-syndrome.


As many flaws as the npm/yarn/pnpm ecosystem has, its interoperability is waaaay better than the whole juggling act between pip, ven, poetry, Anaconda, Miniforge, and uv across projects.

UV it's a step in the right direction, but legacy projects without Dockerfile can be tricky to start.


JS did this right, in fact uv is kinda replicating the npm way. And there are other JS things I'd like Py to follow suit on.


People. People happened. Ideologies and strong opinions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: