Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python packaging is good now (twistedmatrix.com)
160 points by r0muald on Aug 14, 2016 | hide | past | favorite | 142 comments


While this is all nice, I have yet to find a package system that is as straight forward to use as NPM.

Don't get me wrong, I love Python and I'm not so much a fan of NodeJS, as Python is so much better designed than JavaScript, even in time of ES6. But their NPM is really well designed. NPM alone gives NodeJS a competitve advantage over many other popular languages including Ruby (Gem) and Python (PIP).

For example, why does Python packaging need such a complicated bootstrapping process? Install ensure_pip, then the real pip, and on top of that virtualenv. NodeJS just ships with NPM. Done.

Any why do we need to create and manage a "virtualenv"? Why do you have to go through extra hassle just to avoid global installs? NPM just creates a local sub directory node_modules and automatically installs and removes from that. Global installs happen only if you really want it to (npm -g). But you usually don't.

The only central place at NPM is the package cache so it doesn't download things twice if you have two separate projects with similar dependencies. In the Python/PIP world, everything is central and even today you still have to accept extra hassle to get a sane, local setup.


I have yet to come across a scenario where installing a python library as global ended up being an issue in the python ecosystem. You either need it, or you don't. Short of some contrived scenario where a piece of code is not able to utilize a certain package's latest code and you have to juggle two different versions of that package. If I had something like that, I'd fix the code, not juggle old package versions or install all dependencies in my project's sub-folder.

> "For example, why does Python packaging need such a complicated bootstrapping process? Install ensure_pip, then the real pip, and on top of that virtualenv."

It's not that complicated:

    sudo apt-get install python-pip python-dev build-essential

    sudo pip install --upgrade pip
Then you can install any python package to your heart's content with:

    pip install PACKAGE
You don't need virtualenv, nor do you need ensurepip.

>"In the Python/PIP world, everything is central and even today you still have to accept extra hassle to get a sane, local setup."

I think you may be coming at it from a javascript perspective where you expect it to work according to the way you're used to it working with Node/NPM. I have absolutely no hassle installing anything in the python/pip ecosystem. Except maybe MySQL and OpenCV, I'll grant you that. But those are separate discussions as they require external dependencies outside of python's packaging.


Here's a common scenario: you maintain more than one project, and you can't just simultaneously upgrade e.g. Django for all of them (or literally any library where an upgrade may be non-trivial).

Yes, eventually you'll (hopefully) upgrade all your projects to the newest and shiniest versions of all your dependencies, but if you need to maintain some semblance of stability, that's not always immediately possible/practical.


I'd rather use docker over venv in that case. My host python is for horsing around, when something is to be deployed I document and isolate the dependencies in a dockerfile. venv would only solve half that problem and only for python, not including outside delendencies.


If the solution to packaging is "I'll use docker", i.e. I will need essentially an entire OS as a dependency of my application, this shows that Python packaging is broken. More sane packaging system allow to define dependencies per project, and only install packages globally when required


Docker doesn't work if you don't have root.


You need to have access to the docker socket, if you're using sockets to connect. That's not necessarily only root.



True. I also work in a HPC context where that's a problem. That's actually a place where I had to hack together something with venv in order to use an output analysis with a python that's compiled differently from the main Fortran executable. But that's quite an edge case.


This is actually wrong. You can be added to the `docker` group. On Docker for OS X you also don't need root.


In general if a tool requires root to run, of course a system administrator could make some changes to the system to allow a user to run it (adding the user to a privileged group, making the binary setuid, etc).

But that's not the point. The point is, if I'm on a system where I don't have the ability to make system-wide changes (run docker as root, add myself to a special group, etc), then I can't run docker on that system.


I've run into several issues with global installs because the Ubuntu maintainers have decided to make changes to libraries I want to use.


Another example happens when compiling scientific libraries that require C extensions. I've seen disagreements about which versions of libraries to link against. For example, I've run into issues when installing scikit-learn via pip because it linked against the Numpy version in /usr/lib rather than the correct Numpy in my virtualenv.


Ubuntu maintainer here. Can you be more specific, please? I'd like to understand if there was a general reason or if we're doing something that we should not be doing.


Appologies for my crappy memory, but the only case where I can remember the package was when I ran into issues with the PIL library being OS-managed and having basic functionality not working, and the answer I found that worked was installing it in a virtualenv. I also went through some chat logs of me complaining and I saw I had similar issues with urllib3.

The main issue is that they're modified at all, so they don't match up with the docs or what people say online, so until you realise that you're not running the normal code and need to run inside a virtualenv it's very confusing.


The only changes made to PIL in Ubuntu in any current release (Precise, Trusty, Xenial) are security fixes: https://launchpad.net/ubuntu/+source/pillow.

> The main issue is that they're modified at all

Not modifying packages at all would also be a major issue. We'd end up shipping dysfunctional packages, or packages with major bugs, or not fixing bugs during the lifetime of a release.

Distributions do a ton of integration work that I think goes unnoticed by users. We routinely find issues and send them upstream - every development cycle. Users don't notice because they'll also find the bugfixes in upstream releases because we sent them there. But upstream releases don't all happen in lock-step, so we end up having to carry some patches to make everything work with everything else.

In the general case, we can't win. We'll always upset someone, which is why I asked for specifics. In general, we will:

(0) Avoid changing anything in a stable release, so users don't have things changed on them - except for (1).

(1) Fix bugs (including security bugs) as they are reported. Because of (0), this generally means that we cherry-pick fixes, which does cause some divergence from upstream. But if we don't do this, then other users complain either about the bugs or, if we pull everything into a package in an existing stable release, about things being changed under them.

(2) During development of a release we fix for integration issues, since every package in the distribution is expected to work with every other package - and (though with some exceptions) we generally only ship one version of each thing in a release. This means that we sometimes need to patch things (we prefer cherry-picks from upstream, but sometimes we don't get a response from upstream in time for our release schedule) in order to make everything work.

But in the specific case of PIL, we appear to not have had to do any of these things, except for some security fixes.

For urllib3, it looks like we had to make some modifications to fix security issues with regards to TLS and certificate verification in Trusty, and one bug in Xenial related to broken IPv6 square bracket handling, and that's it.

I don't really see how we can do anything different here. Should we have ignored the security issues and shipped vulnerable packages? Or should we have upset many more users of the stable release by bumping behaviour under the feet of existing deployments?


I'm not a Linux distro expert, I'm not even a python expert, all I noticed is that my code didn't work with the OS-managed PIL/urllib3 and installing the versions from pip in virtualenv fixed things. I don't know the reasons, just that was my experience. Maybe it was just the OS versions being old, I don't know.

The maintainer approach works well for everything you maintain, it just sucks for everything else, including installing out of repo software or writing your own software. It's made particularly bad by Python's packaging system which defaults to global installs.

FWIW, I lay this one more at the door of Python than I do Ubuntu, you guys are just dealing with their packaging system and I think that "Python packaging is a good now" is just ignoring the shitty parts.

Frankly it's easier for me to just always run my code inside a virtualenv than it is for you guys to make some drastic changes to make my life marginally easier, when I don't even really know what the problem was. It's just that this is a non-obvious thing to do for someone who hasn't been in the python ecosystem for long.


It sounds like all you needed was a newer version. Most distributions deliberately try to ship a version that was current at the time of release and don't update it later except for bugfixes. This is for the user stability reason I gave above. This is what a distribution release is, and why many Ubuntu users prefer the LTS release over every six-monthly release.

If this is the case here, then this isn't a Python-specific problem at all. Developers have a choice: use the chosen distribution release version, or if that's not new enough, then find another way. For Python, one (commonly accepted and good) answer happens to be virtualenv. Many choose, because of their particular circumstances, to go all virtualenv and ignore distribution packages altogether.

Can I guess that you used an Ubuntu LTS, so you effectively opted in to packages that are older but don't change often?

Some solve this by never using a distribution version at all, which is fine. But understand that this is the nature of what a distribution is, and applies to the entire ecosystem, not just the Python one.


It's possible I had an old version, but I was trying to do super basic things with PIL on 14.04 LTS; literally just cropping a JPEG and things were failing.

Do I have this problem in Java/Node/Go/Rust? No, I don't.

The reason I don't have this problem is that dependencies are not installed globally, they are defined per-project and often statically linked in, there is no such thing as a conflict between two of those projects.


> It's possible I had an old version, but I was trying to do super basic things with PIL on 14.04 LTS; literally just cropping a JPEG and things were failing.

It sounds to me that you hit some kind of straightforward bug then, rather than some systemic issue.


> I think you may be coming at it from a javascript perspective where you expect it to work according to the way you're used to it working with Node/NPM

Please avoid making such assumptions about other people. It's just a little step from there to ad hominems.

To make my point more clear:

I love Python and 'm used to Python/PIP, but when I had to work with NodeJS and saw NPM, I immediately thought: These people got it right. Not the language, mind you, but the package management!


Not everyone is running debian flavored os, in fact not everyone is running linux.


for windows, in powershell

     (Invoke-WebRequest -UseBasicParsing "https://bootstrap.pypa.io/get-pip.py").Content | python -
but generally in that case, one doesn't need to install ensure_pip, and can use that. And one doesn't need to install pip at all, since it is an optional come-with

but in most cases, the `curl url | python -` or equivalent solves the problem


As long as you're using Python 3, pip and pyvenv are bundled with Python. Pip is also included with Python 2.7.9 and later for Python 2 users.


Having pip installed by default is really nice, so that it consolidates people to use pip unless they have specific use cases e.g. Anaconda users. However, pyvenv/virtualenv isn't that great for the reasons the poster mentions, specifically that you have to set up something instead of just installing in a local folder and referencing that. I do think that pip is more mature and in some ways is better than npm, but in the respect of where it installs modules, it's definitely behind (I do realize that this is also a Python issue wrt where it references modules, but installing modules global by default is a pip issue. At least they're working on doing more local installations by default[0]). I shouldn't have to worry about where my modules are installed in general, and if I'm working on multiple projects with different package versions, I shouldn't have to think about virtualenvs because if I mess up and forget to activate it, it may or may not give me strange errors during various parts of development.

0: https://github.com/pypa/pip/issues/1668


I've also found venv to be a pain if you're trying to set up your editor to execute scripts in the virtual environment. Npm project? Build and run. Python project? Figure out how to make VS Code activate the environment first.


That is one of the things I really appreciate about Node/npm. Drive space isn't a concern particularly, since packages aren't gigabytes in size, so having them installed locally is fine. The biggest gripe is you have to include `node_modules/` in your `.gitignore`.


NPM is also a pain in the neck - I've had to wipe out the node_modules folder many times because it got in a bad state when adding dependencies.

Other than that, I agree.


In my experience, the only time this happens is with `npm uninstall`. If you only `npm install` stuff forever, node_modules seems pretty stable. Disk usage, however, is another matter.


I'd say Conda [1] is as straight forward to use as NPM, and handles virtual environments (better than virtualenv) as well.

I don't really see the issue with creating a virtual environment in the case of specific dependencies, and otherwise dealing directly with the global environment (which really isn't that different from e.g. NPM and JS, when you think about it).

[1]: https://github.com/conda/conda


Composer in PHP-world works even better than NPM, while being similiar in some ways.


conda is way better than plain pip (although it is still compatible with it), and possibly better than NPM. It's just not well known enough yet.


I've always argued that npm is nodejs's killer feature. Semver is suggested and used almost everywhere. Setting up the development environment for almost any project is just `npm install` from within its directory. Dependency hell isn't a show-stopping issue: it's perfectly fine for libraries A and B to have sub-dependencies and to depend on different versions of C. No longer do I have to wrestle with things when I try to install a project that relies on a different version of what I already had installed on my system. (Sure, there's some gains to be had to updating packages to consistently use the latest version of C, but when we're talking about libraries on the order of kilobytes, that really shouldn't be a blocker issue stopping something from working.)


It almost seems, based on TFA's risibly misguided "dumb idea" link to npmjs.com, that the author thinks python's story is better than node/npm's. It's called "separation of concerns", kids, and it's a good thing, that one part of a script doesn't have to worry about what a different part of the script is doing. The fact python never got it to work correctly is python's problem; node handled it from the beginning.


DUB for dlang is heavily inspired by NPM.


Python packaging is great.

What's a bother is - as that article mentions - end-user deployments.

Now, I sort of have that down to a tee by using cx_freeze and Inno Setup, such that I can distribute my software to my end-users and all they have to do is click on a Setup.exe and my applications install just like any other Windows program.

I can do a similar thing on the Mac, but it's still a pain.

The main trouble with packaging Python applications up, is an apparent assumption (at least, that's the impression I have gotten over years of figuring out how to distribute my applications), that Python programs should all be open source - so for example, distutils assumes that the author wants his code to be readable by everyone who wishes to.

I wanted to be able to produce full applications and somehow compile and package them as in for e example, a C/C++ application.

The closest I can get to that currently, is cx_freeze, which isn't compilation, merely turning the Python code into bytecode.

Nuitka is showing lots of promise, but as yet it still cannot compile my applications into something usable (my really big Qt4 GUI applications either won't compile or there are lots of problems cropping up even if they do manage to run).

Here's hoping end-user deployment gets much easier than it is now, someday!


Could you explicate on your end-user deployment workflow a bit? I'm building a game in python and would love to distribute .exes to Windows players


Sure!

1) Develop your Python application ;)

2) I have a batch file build.bat which I use to build the deployment folder. This sits in the root directory of every application I develop, and is customised. It basically calls cx_freeze via the command line rather than bother with a setup.py file - I find this method far quicker and more convenient. An example...

  cxfreeze -OO -c CaptainsLog.py --include-modules lxml._elementpath,lxml.objectify,win32com.gen_py,requests.certs --base-name=Win32GUI --target-dir ..\CaptainsLog_DIST --icon CL_new_256.ico
This is for my Captain's Log program - a 3rd party application for use with the game Elite: Dangerous

cx_freeze will build and include all the necessary Python and Qt libraries to ..\CaptainsLog_Dist

I've had to ensure cx_freeze included some required stuff like lxml and those other things because for various reasons it didn't automatically include them when they were not specified, so be aware of that possibility when attempting to freeze your application.

3) I then can test whether the build/distribution works by double-clicking on the captainslog.exe file which cx_freeze has produced.

4) If all is well, I then run Inno Setup. I have a file captainslog.iss which I use and change as required (e.g. new version number). This will basically take everything which is in the CaptainsLog_DIST folder, and package it up into a suitably named Setup.exe file.

5) I then sign the Setup.exe file with my code signing certificate. Search google for how to obtain such - basically you need at least a Class II certificate from a suitable provider to prove your identity. I use a utility called ksign to do the actual signing.

6) I test the Setup.

7) All going well, deployment to my users is simply providing a download location - notification is built into my application (check for new versions), or if they disable that, peruse the game forums or my web site.

It's all been rather streamlined over a LOT of trial-and-error, frustration, and sometimes a lot of swearing. But the above works REALLY well for me and I use this workflow for bespoke software I write for paying customers.

It's a similar workflow (with some differences) for Macs.

The real beauty of it is that your end-users don't have to worry about installing Python, Qt, any of the Python dependencies/packages - everything required to run the application is provided.

The other bonus is that you only need to worry about Python dependencies at /your/ end.

Note: when developing and deploying my applications, my entire toolset is the 32-bit versions - this ensures that your package will install and run on both older 32-bit and newer 64-bit versions of your target O/S.

Regards.


Thank you! Very helpful :)


py2exe works well for me. It's no Windows installer but packaged in a zip file is good enough (for my use case at least). The only annoyance is the need to install some Microsoft C++ redistributables on the client side.


See my workflow - once you get it down to a tee, it becomes a very smooth workflow and a breeze to distribute in one convenient setup.exe.


> The closest I can get to that currently, is cx_freeze, which isn't compilation, merely turning the Python code into bytecode.

Is there a problem with that?

If you're worried about someone stealing the code you can obfuscate it a bit.


> Is there a problem with that?

Yes - I wish I could compile my applications. Certain ones could benefit from being sped up a lot.

I have used Cython before, on a pure Python program and using the --embed command, but it was a LOT of work to get that working (ended up having to take my Python program and merge everything into one monolithic Python file and feed that to Cython. And that meant taking the output from pyside-uic and bunging that in, as well as merging in the Resource file generated from pyside-rcc.

I did that as an exercise. It worked, but was an awful lot of hard work and it was a horrifically hacky way to do it ;)

At the same time I did also obfuscate things like the Licensing code that was in that file, in Python, beforehand.

So yeah, it's not like I'm new to that sort of thing :)

I just wish there was a more effortless way to do that - Nuitka has that promise but last time I checked it's just not there yet, despite the awesome work being done by Kay.


I have tested various Python implementations and pypy is the fastest, so you could create a self-extracting archive with pypy and your program.

I created something similar to that here [1] for an abandoned project (but the project is not about self-extracting archives, so don't bother compiling the project).

(1) https://github.com/RamchandraApte/unvanquished-installer/blo...


Good suggestion. I have looked at pypy in the recent past, but the showstopper is PySide isn't supported.


Python is not a compiled language.

Is there any reason not to write your app in Cython? Then you could compile it with your ordinary C compiler. Could be much harder to reverse engineer that way, if that's your concern.


Yes, I know that :)

My reply at https://news.ycombinator.com/item?id=12286806 is relevant.

Regards


/usr/bin/python and pyc/pyo files care to disagree.


By "compiled," I meant "down to native machine code," not to Python bytecode.


In the past 12 hours, I started a new Python project. I can't agree with the sentiment - the most I can say is that Python packaging isn't terrible now.

My intention was to create a Python 3 package using best-practices, a nice division of labour by process to avoid threading problems, etc.

So far I've run into the following issues regarding packaging:

Getting started - there's a glut of documentation, much of it contradictory, much of it out-of-date. Simply google "Python packaging" and you'll have to pick between 4 sources of info on the first page, all of which look official.

The tools - this one was a new one for me; I normally use virtualenvs for development, but vaguely remembered it being bundled in Python 3. Turns out it's not virtualenv but "pyvenv". Beyond the confusion of the name change, there's also a tool called "pyenv" muddying the waters.

The specifics - I started with https://packaging.python.org/ and was struck by how ambiguous the instructions were. Doing something as simple as declaring your package's version has 7 different options: https://packaging.python.org/single_source_version/. In the end, I got the most help from reading the guide's sample project... I only have to wonder about the licensing implications of that.

The infrastructure - because I was creating a project with multiple executables, I needed a way to start one from the other. Despite knowing that setuptools is responsible for generating the executables, I quickly found that there's just no way to find those executable paths programmatically.

Library choice - I needed an inotify library. I searched for "inotify", "notify" and "watch" on Pypi but had little way of distinguishing the results - the "weight" column is useless as compared to a rating system or even download count. Thankfully, I remembered about awesome-python.com which bridges the gap.

All told, it took me several hours to bootstrap the project, not even including things like testing. I'd say Python packaging has a way to go before I'd call it "good".


I tend to use [0] for the boilerplate for getting started in writing a Python package. It was mostly used for Python 2.7, but I've written a few local Python 3.5 packages with it and everything seems to be fine (ie, "it worked on my machine - what could go wrong?").

I then use `nosetests -v` to run the tests.

[0] - github.com/ambitioninc/ambition-python-template


Is it?

On Debian there is Debian Python, and Debian pip. Then there is easy_install to install own pip, and that's different. virtualenv comes on top of all. Of course, there is Debian's Python packages, too.

On MacOS X with Macports, there is Apple's Python and Macports' Pythons, all with different pips. Of course, there is Macports' Python packages, too, for different Pythons.


No matter what small tooling changes come about, the Python packaging ecosystem is fundamentally flawed, and will always be more complicated and less useful than others such as Maven (yes, Maven).

The primary reason is that setup.py, etc., have you download/manage one set of dependencies for each project you're working on rather than simply housing all dependencies in a common folder, separated by version, as Maven does. It's pointlessly redundant. But the real pain comes when you start to install things. Since in Python land, dependencies for all tools are typically stored in the same folder, you very easily and quickly hit version conflicts since the dependencies are not really segregated by version. This leads to brutal bugs that creep on you at runtime when you realize, too late, that some tool you installed has dependencies that conflict with another.

What's worse, complex Python projects like OpenStack may have numerous dependencies which all in turn have differing and conflicting dependency versions of their own. Trying to keep them all in sync is a ridiculous exercise in time wasting and pointlessness, and problems caused by version mismatches are common.

The brilliant solution (sarcasm here), of course, is to create virtual environments with separate copies of every dependency for every tool we run. Again, pointlessly redundant and a pain to manage.

Is there a better way? Obviously. Use a single common repo with dependencies structurally segregated by version (ie: different folders), as is the way of doing things in Java-land. I know that Java-land stuff is bad and we're all supposed to hate it. But I've always failed to understand why developers in other language ecosystems continue to create tools in a vacuum, as if roadmaps for solving these problems don't already exist elsewhere. Look beyond your bubble. Hopefully one day the Python folks will catch on.


All of the CPAN-inspired packaging ecosystems are fundamentally broken. Including Maven.

> The brilliant solution (sarcasm here), of course, is to create virtual environments with separate copies of every dependency for every tool we run.

Brilliant solution is to recognize that environments need to be packaged and managed too. No sarcasm. That's what nix does. And it makes package management a solved problem (ignoring current implementation-specific issues of course).

So, yes, there is a better way and it's nix [1] and guix [2] and the ideas behind them. Check them out.

[1] https://nixos.org/nix/

[2] http://www.gnu.org/software/guix/


  The real takeaway here though, is that although it’s still not
  perfect, other languages are no longer doing appreciably
  better.
Not sure how accurate is this.

At least not true in the context of OCaml/OPAM.

  As always, I’m sure none of this applies to Rust and Cargo is
  basically perfect, but that doesn’t matter, because nobody
  reading this is actually using Rust.
So it seems that those using rust don't read HN:

https://github.com/servo/servo

https://github.com/dropbox/rust-brotli

--

Bottom line: Bash everyone else arbitrarily and claim oneself to be the best!



> python setup.py sdist bdist_wheel

Wheels suck.

By using the bdist mechanism they make it impossible to build a single cross-platform, cross-Python package that just works. Wheels can't run 2to3 at installation so guess what? You have to make separate packages for Python 2 and 3.

There should be an easy way to have a pure Python package that "just works" on Python 2 and 3, 32-bit and 64-bit platforms without having to build separate packages. Right now that can only be done with the third-party Setuptools using sdist and a traditional setup.py. Distutils is still a half implemented packaging system. I wouldn't say that Python packaging is a solved thing.


You can easily build a single cross-platform, cross-Python wheel that just works. You just need to add "[bdist_wheel] universal=1" to your setup.cfg and write your code in a cross-Python style enabled by six and future (https://pypi.python.org/pypi/future).


Six is a third-party library and interferes with migration to Python 3. It requires you to tiptoe around the differences between 2 and 3. 2to3 is built in and does most of the heavy lifting for you. It should be possible to package Python code using the official Python distribution and nothing else.


> Six is a third-party library and interferes with migration to Python 3. It requires you to tiptoe around the differences between 2 and 3.

That's a strange thing to say. Six and future were written to ease the migration to Python 3, and both are endorsed by the official Python porting guide (https://docs.python.org/3/howto/pyporting.html). Meanwhile 2to3 is problematic because it complicates debugging and doesn't address issues like string handling, backports and other behavioral differences. Few popular Python libraries use 2to3, and most embrace the idea of single-codebase cross-version support.

I would agree that the Python core developers bungled the migration tools in the standard library, and I wish they were better. But as noted by the OP, library functionality in general is steadily becoming more decentralized, and six and future are the best options for packaging cross-version libraries today.

More generally, a lot of good work has gone into the wheels architecture, and it's a huge improvement on what we had before. If you think you can come up with something better, I would encourage you to try - the opportunity is there.


2to3 only handles simple base cases, it isn't applicable for translating larger more complex codebases. The same tiptoeing you didn't want to do with six, you are doing by only using a subset of Python2 that automatically translates.


if you have any C code, you can't. Even if your C code is optional, you shouldn't, at least one that you'd put on Pypi, because unix users with full toolchains will never see your source .tar.gz anymore from pip.


Fair to say, but then there isn't any technology out there to build "a single cross-platform, cross-Python wheel that just works" and links against C code. That's true of any packaging framework that links against native code...


wheels are only useful for Pypi uploads as far as they allow uploads of difficult builds, like Win32 compilations.

wheels are unbelievably critical in that they allow super easy caching of pre-built packages on any given host. If you've noticed these days you can type "pip install numpy" at will and it seems to usually run in less than two seconds rather than 5 minutes, thank wheels. This is particularly a big deal if you work lot with CI.


No discussion of python packaging without mention of conda is complete. It's as good as "apt-get" for Python (and not just for Python, but ... mostly for Python).

http://conda.pydata.org/miniconda.html


I believed that! Then I actually used it and got incompatible packages, packages only for 32 bit XOR 64 bit etc. It was not remotely as good as the package managers I am used to on several distros. :(


For a long time, PyPi intermittently reported extremely exaggerated download statistics and sometimes never even reported any statistics. Then they got bored of trying to fix it and removed it altogether, saying they want to concentrate on warehouse. I believe currently the only way to get any kind of statistics is to perform a query on Google's Big. Any attempt to talk about this issue results in the thread being locked and offended replies, and while I understand what it's like from the developer's point of view, the current state of things is completely unreasonable.


It may well be. But one problem is all the references to all the various old ways that a first time deployer has to wade through. It took me quite a while to figure out what's current and what should be ignored, as well as what's old but still needs to be done, on a small project that I did to learn exactly this.


Personal perspective of someone who's only used python 4 years and mostly in an linux ops capacity is that the packaging has always been pleasant to use and the few times I've had to install a node/ruby app as an end user have been comparatively horrible.


This is the first time I've heard of ensurepip. It looks cool, but I'll wait until it becomes common knowledge before declaring that "Python packaging is good now", but I definitely agree that it's improving. I'll have to update my StackOverflow answer now to mention this. http://stackoverflow.com/a/14753678/247696


"From 2008 to 2012 or so, Python packaging was a total mess."

Some things have improved, but fundamentally there really is not much difference in creating a package now vs. then.

A certain xkcd comes to mind.


> There should be tools that help you write and update a setup.py. Or a setup.python.json or something, so you don’t actually need to write code just to ship some metadata.

There are some out there, e.g. OpenStack uses https://pypi.python.org/pypi/pbr - it was pretty painful around the 1.0 transition when no two libraries seemed to be using compatible versions, but it hasn't caused me any trouble in quite a while now.

I'm more concerned by the lack of a (non-greedy) dependency solver in pip.


While I agree with the fact that it isn't terrible as it once was; I personally can't say pip is really starting to look any better until the following is solved:

https://github.com/pypa/pip/issues/988

''Pip needs a dependency resolver''

Without such a thing you really don't quite have a way to ensure what you are requesting to be installed actually is a valid set of things.


NPM did it right. PIP got it wrong. Requiring the same version of a library per application is really a bad idea.

It creates a lot of work to align your dependencies.


My main peeve is that distutils2 died, and with it, the idea of putting package metadata (author, version, dependencies) in `setup.cfg` instead of `setup.py`.

It's bad enough that I have a boilerplate `setup.py` that just does `import ConfigParser` and blasts the entire `[setup.py]` section into `setuptools.setup()` as `kwargs`. I copy and paste this boilerplate into every new project that I start.

Oh and while we're ranting about Python packaging, let me rant about tox, which is so-very-useful but also so-very-dangerous: IF YOU (SLIGHTLY) MISCONFIGURE YOUR TOX.INI FILE, IT WILL DELETE YOUR ENTIRE PROJECT DIRECTORY! [1]

I suppose I am getting too old and cranky for this stuff. :)

[1] https://bitbucket.org/hpk42/tox/issues/352/tox-deleted-all-o...


distutils2 may have died. The idea of declarative package metadata hasn't (although it may not have advanced as quickly as you might like). There is a PEP[1] (which is sadly not yet implemented[2]) which a) starts to move some metadata into declarative format b) is designed to allow implementation of alternative build systems that have more declarative metadata.

[1] https://www.python.org/dev/peps/pep-0518/ [2] https://github.com/pypa/pip/issues/3691


Maybe I'm a bit too old-fashioned, but you should always assume that any program you run will delete your home directory with a probability > 0.

If you're not prepared for that situation at all times, you are using computers wrong.


The article say that with any recent python you can get started with a one liner:

   Python Core started distributing the ensurepip
   module along with both Python 2.7 and 3.3 ...

   $ python -m ensurepip --user
Tried here with python-2.7.11-3 and just got a traceback error.

Turns out according to one of the links given by the article[1] that this feature was actually not backported to these older versions of Python, apparently to encourage people to upgrade to new Python. This is unfortunate.

[1] https://www.python.org/dev/peps/pep-0453/#including-ensurepi...


Which OS are you using? On Ubuntu, I get this useful output:

  $ python2 -m ensurepip --user
  ensurepip is disabled in Debian/Ubuntu for the system python.
  
  Python modules For the system python are usually handled by dpkg and apt-get.
  
      apt-get install python-<module name>
  
  Install the python-pip package to use pip itself.  Using pip together
  with the system python might have unexpected results for any system installed
  module, so use it on your own risk, or make sure to only use it in virtual
  environments.


It's upsetting that Ubuntu is so opposed to ensurepip that they won't even let it work in user mode. There is no way to use these Python standard library features without being root and installing separate Ubuntu packages.

This was Ubuntu bug #1290847, which is now marked as "Fix Released": they consider the current behavior okay.

Python packaging is good now, except on Ubuntu.


How does Debian do it?


The same people who look after Python in Ubuntu also look after Python in Debian (as part of a wider team) and also are active upstream in Python in making Python work well across all distributions.

Ubuntu is hardly on its own here. I recommend that anyone who disagrees with how Python is packaged on Debian and Ubuntu get in touch on the relevant mailing lists. If you think it's wrong, there's probably a use case you're missing, or you'll find that your proposal will badly break some other expected behaviour.


Interesting. On Fedora here, and it actually includes version 3.4 of python with the command name "python3", and the above instructions worked just fine using it.

However, now you have me worried that the article left out caveats and warnings about how these commands might lead to "unexpected results".


The worry is:

1. Install system python

2. Install pip from somewhere

3. pip install requests

4. apt-get install something with depends on python-requests

Should the installation of apt's requests fail? Should it work but clobber your pip installed requests? What happens when the apt version is older than the pip version? Does this break whatever you used pip install requests for?

There's two resolutions to this issue.

1. Don't use pip, always use apt for your python dependencies. The Ubuntu devs would really prefer you pick this option but not everything on pypi is on their repo and sometimes working consistently across platforms is more about than within platform consistency. 2. Use a different python environment than the system python. This could be done by using virtualenv, a different python version or a separate installation.


   $ python -m ensurepip --user
   $ python -m pip install --user --upgrade pip
   $ python -m pip install --user --upgrade virtualenv
> ...

> From here on out, now the world is your oyster; you can pip install to your heart’s content, and you probably won’t even need to compile any C for most packages. These instructions don’t depend on Python version, either: as long as it’s up-to-date, the same steps work on Python 2, Python 3,

Nit: I suppose installing virtualenv works in 2 and 3, but hasn't venv in 3 made that unnecessary?


virtualenv has an advantage over venv, which is that people have spent years and years writing instructions for virtualenv, and not for the cases that venv handles slightly differently.

For example, with virtualenv you can choose to use a mix of globally- and locally-installed packages. With venv you can't. (You get global or local, but not both.)


In that regard, venv is better. It's better to have a single place where you install your packages rather than a few different places, especially if you're working on a team. It would be hell to have a mix, not knowing exactly which goes where or why, and then explaining to another team member your setup in a coherent manner. It's much better to just know that your packages are either global or local and no in-betweens.


> It's much better to just know that your packages are either global or local and no in-betweens.

The database driver and scipy are installed globally. Everything else is local to the environment.

Not hell at all, especially compared to compiling numpy/scipy every time you want to spin up a venv that uses a single feature.

Examples other than DB drivers and scipy exist. Compiling software isn't really that hard, but it's an unnecessary hurdle in many cases - especially when you've already bought into an OS that does binary distributions specifically so you don't have to compile anything.

I'd be comfortable agreeing that any pure-python package should be venv only, but not that every package should be.


The last three (cross-platform, Python 2 and Python 3) packages I published to PyPI have been Ctypes/Cython wrappers around Rust binaries. Yes, I'm a sample size of 1, but whatever, never let reality get in the way of your rhetorical blogpost, I suppose.


I seem to feel more comfortable with redundant copies of libraries than virtual environments.


I'd take pip over npm any day of the week. My experiences with Node have been horrible. Npm just grows okay but do not try to uninstall.


A while back, I wrote some of my thoughts on Python packaging. I got excited seeing this title, but reading the article, I remain unconvinced. The whole thing is still way too confusing, and it's still not nearly as intuitive (or teachable) as "npm install ___".


Most of the things you said in your writeup were no longer true in February 2015, and I think there are no concrete things that you wrote that are true anymore, thanks to the hard work of Python packaging infrastructure maintainers (https://github.com/pypa).

I find the griping a bit over the top. The point of the OP is that new exciting improvements have arrived in the Python packaging infrastructure. Yes, there were many ugly overwrought things in setuptools' past. Yes, npm has several great ideas. npm is also much younger than setuptools/PyPI, and I can cite lots of horror stories from its pre-io.js days too. But the volunteers who got us this far deserve to be recognized for the progress they've made.

I think the biggest legitimate concern right now is how unintuitive it is to use pip in the presence of several Python runtimes/versions.


Yeah, I'm sure things are better. And your typical developer will probably be just fine (I mean, we know they are because Python is so popular!).

But for a beginner who is just trying to figure out how to share his or her code, it's still very confusing. And yes, part of that is definitely the fact that all of the mistakes of the past and their messy workarounds still show up in Google searches, muddying the waters; this obviously gives npm an edge. But even so, I feel that npm just does a better job when it comes to being learnable: people pick it up right away with very little trouble, and you can have your first "public-facing" module up on day one with very little effort. Indeed, in a way, the fact that it's so easy is what's giving it the most grief right now!

So there are definitely lessons to be learned there, for everyone.


"pip install ___"

Everything else is pretty much obsolete.

The problem is in how it doesn't really work reliably, especially with C extensions, which are much more common in Python-Land than with anything Javascript/NodeJS-related. You're also comparing with something that, by virtue of its age, has far less technical debt. Also, NPM is a "commercial product", PyPI is a community effort. Maybe the comparison to something like Anaconda would be more apt.

The author acknowledges most problems anyway, he just comes to the wrong conclusion: It's not "good" now, it's still terrible, but everything else is more-or-less as bad or worse.


Funny that the PyPI web interface is down right now.


That's a very good type face to read in mobile!


I wish!


Could you elaborate on the issues you have with the current system?


Deploying a python application is still a horrible experience. My best attempts so far involve doing `pip wheel` during the build, then discarding anything that's platform dependent and fetching their source dists instead, then bundling that together with a script that sets up a virtualenv and installs all these dependencies and the project being installed ...

Compare to a jvm uberjar.


Yep, I'm doing the same thing to deploy Python apps on Amazon's Elastic Beanstalk... I write an "abstract" app-requirements.txt (which typically just has a single line for my application's own package), create a fresh virtualenv, `pip wheel -r app-requirements.txt` to resolve dependencies and build wheels for all the packages, and finally `pip freeze > pinned-requirements.txt` to create a list of version-pinned packages.

Zip it all up and then deploy with `pip install --no-index --find-links=/path/to/wheels -r pinned-requirements.txt`. Et voila! Binary artifacts that don't suddenly break because PyPI goes down or a package maintainer uploads a new version that unexpectedly deprecates an API that you're using.


You might want to consider checking out Pex (https://pex.readthedocs.io/en/stable/). It helps you create a single, executable artifact for your Python project very similar to an UberJAR.


Except Python you can often run from git. Git pull do some pip done.


You can do the same things in Java land as long as you have used a build system written in the last decade such as Maven or Gradle, you can just as easily pull down a git repo and run mvn package


Well that is

a) A 30 sec - 10 minute process to build (vs 0 in python)

b) Weird because Java has a fatjar concept.

Python and Java are different. The easy path to deploy them doesn't need to be the same.

You may need to make them the same for ops sanity.. but if you are just doing 1 or the other, meh


> a) A 30 sec - 10 minute process to build (vs 0 in python)

Have you ever run someone else's python code? You have the exact same issue pulling down their requirements.txt, which is often not enough because you need to apt-get other libraries, try pip again, apt-get more, try pip again, etc.

> b) Weird because Java has a fatjar concept.

Sure, that's what you would do if you wanted to deploy an application to proc, my point was that it supports the same git checkout workflow if you want it to, which can be useful when dealing with other people's code and you want to make changes and not just run whatever jar they made.


To be fair, you need a python runtime installed to run python, the same way you need a java runtime installed to run uberjars.

...but, the deployment is a single artifact.

I don't think doing a git clone, followed by a pip install and waiting for all the dependencies to download... and compile... (or not, if you have the wrong system packages installed) really compares.


Why not?

If you are doing a blue/green type deploy.. you are basically building up a new VM to be your new environment. It can do all kinds of download and compile type stuff. Who cares if some is pip installing?


Having to explain the concept of virtualenv is still a pain compared to local install à la node.

"How do I make an exe" with my PyQT app is still a very hard question to answer.

Web deployment is still hard in Python because of lack of general hosting support if you don't own your server.

The situation is way better, and I'm very glad of the work done, but it has still a long way to go.


> Web deployment is still hard in Python because of lack of general hosting support if you don't own your server.

Does conventional shared hosting actually matter anymore, for application developers as opposed to someone just running a WordPress site? Virtual private servers are just as cheap now.


Yes it does, because a lot of people needing to deploy tools are not dev.


> Web deployment is still hard in Python because of lack of general hosting support if you don't own your server.

GAE? PythonAnywhere? Heroku? OpenShift? Docker-based stuff? https://wiki.python.org/moin/SpecializedCommercialHosts maybe? https://wiki.python.org/moin/OtherCommercialHosts if that wasn't enough?


0.1% if the commercial offer for PHP, and they all involve using the command line or know a bit about coding. Again, it's better, but many people I know run PHP business with zero coding knowledge. They upload to ftp scripts they bought on the web, and make a living out of it. Some are kids trying stuff. The little things matter.


It's a good summary of the context, history and tools that have been cleaned up but sweeps an awful lot under the rug with "Then, for each project you want to do, make a new virtualenv [...]". What's that and why are you doing it? Oh, because the packaging doesn't work.


The packaging works fine. Do all Java apps running on a host use the same classpath pointing to /usr/lib directories full of JAR files? Of course not. That's all that virtualenv is doing; the equivalent of managing /lib directories for an invocation of the interpreter.


I don't follow how these are at all similar other than you saying they are 'equivalent'. Typical java apps get their dependencies from the same maven-managed local repo. The tool itself is baroque but conceptually this is much simpler than creating magical 'environments' - the repository of dependencies, what dependencies a particular app uses and what version of the runtime it's invoked with are all separate and orthogonal to each other.

An even simpler way to think about it - imagine you just invented python and are designing the tooling around dependency management and packaging. Would you come up with 'environments'?


> Typical java apps get their dependencies from the same maven-managed local repo.

What? No, they don't. Typical Java apps are deployed on systems that don't even have Maven installed. Typical Java apps are executed with a classpath argument pointing to the libraries installed for the application. What are the directories containing those JARs (or an uberjar) except an "environment?"

Hell, the Java classloader interacts with the classpath in pretty much the exact same way that the Python interpreter interacts with a virtualenv.


Is there anything that even remotely comes close to bundler in power and ease of use? Suppose for example that I want to upgrade a package and all packages that it depends on, how would that be done? Or suppose I have different environments a project can be run in that should load slightly different set of dependencies?


Bundlers default-global behavior is a disaster. Last time I got hired at a ruby shop none of the scripts worked, most projects didn't have a lock file, etc. Because they just ran the things directly, using whatever from the global environment.


If the gems are installed globally and no Gemfile.lock file is used, of what use is bundler anyway?


While bundler certainly have warts, the disaster in that case is not using a lockfile. That completely defeat the purpose of the tool, to get repeatable installations.


> Is there anything that even remotely comes close to bundler in power and ease of use?

Maven and co.

> Suppose for example that I want to upgrade a package and all packages that it depends on, how would that be done?

Change the version number in pom.xml/build.sbt/build.gradle. Done.

> Or suppose I have different environments a project can be run in that should load slightly different set of dependencies?

Define a new project that depends on your main project and the extra dependencies.


Ah yes. The question was about the Python ecosystem. There are other good solutions for other platforms.


> Is there anything that even remotely comes close to bundler in power and ease of use?

Yes, pip, since before bundler existed.

> Suppose for example that I want to upgrade a package and all packages that it depends on, how would that be done?

pip install --upgrade package

> suppose I have different environments a project can be run in that should load slightly different set of dependencies?

venv activate myproject


There's a bit that's still lacking in Pip. For instance, Pip only installs new packages into your venv, rather than trying to keep it synchronized.

I've run into bugs where someone ripped a library out of requirements.txt because they thought it wasn't being used any longer. Turns out it was still being used, but the dev couldn't tell because the lib was still in their venv. Tools like Bundler will actually take the dependency away from you.

Python's packaging system is great compared to C, but it's primitive compared to Ruby, Haskell, Clojure, and lots of others.


https://github.com/nvie/pip-tools solves this by providing pip-sync, which will uninstall removed packages.


pip doesn't even use a solver to determine which packages to install, it just uses the first mention it finds.


I don't know what this article is talking about. What is wrong with apt-get install python-something to install the python package called something? Or if it's not in the repositories, you checkout the library to a subdirectory of your project and call `import something` just like you would when apt-getting it.

So honest question, what's this easy_install stuff? Apt has been around since the nineties, just like python, and didn't change much as far as I know.


How would you then resolve the following scenarios:

1) I need to have two applications running on the same host with different Python versions. E.g. Python 2.6 and Python 2.7. apt would only support one of the Python 2.x line installed at a time.

2) I need to have two applications that have conflicting version requirements on the same host. E.g. neither set of requirements can be installed at the same time.

3) I need to install a newer version of a Python library than the one in apt. If I install it into the system Python via pip, now apt is out of sync with what is actually installed.

4) I need to install a version of a Python library that conflicts with the requirements of system tools that rely on the system-installed Python. If I install said library, then the system tools will stop working / break down.

Also "checkout into your project directory and call `import`" doesn't scale. Maybe it works well for toy projects and one-off things, but not so much at scale.


> 1) I need to have two applications running on the same host with different Python versions. E.g. Python 2.6 and Python 2.7. apt would only support one of the Python 2.x line installed at a time.

I take it you haven't used Debian/APT for last three or four releases, because what you assert (only one Python 2.x version per host) is just plain invalid.

> 3) I need to install a newer version of a Python library than the one in apt. If I install it into the system Python via pip, now apt is out of sync with what is actually installed.

Uhm... I don't know why you have a problem with building your own package with the newer version. I do it all the time, for software both others' and mine.

This leaves points 2) and 4), which are essentially the same: how to install two conflicting versions of the same module for two different applications (one of which may come from the distribution), but this is not a problem with dpkg or RPM, this is a problem with Python's module lookup mechanism. For C libraries there are already good protocols for handling this issue (a little different for RPM and dpkg).


> Uhm... I don't know why you have a problem with building your own package with the newer version. I do it all the time, for software both others' and mine.

Maybe things have improved in the past few years, but last time that I built a dpkg for a Python library it was far from painless, even with some wrapper off the Debian wiki specifically for that use-case (and I was just doing this for pure-Python).

I've dealt with building both RPM and DPKG packages in the past, and I recall all of the build tools for each being far from ideal, and even the use case of "I want to package up a list of files and install them to a specific location" was far more difficult that it should have been. The biggest issue being that most of the tooling is based on the use-case of taking a tar of sources files, and running it though autoconf-esque build process to build it, apply patches, etc, which might be great for prolific package maintainers, but make things a bit more daunting for the new / causal user that just wants to create a simple package.

> how to install two conflicting versions of the same module for two different applications (one of which may come from the distribution), but this is not a problem with dpkg or RPM, this is a problem with Python's module lookup mechanism

Regardless it is an issue that exists. The parent post was saying that "apt should be good enough for all use-cases" which I was rebutting. I wasn't claiming that APT was sucky / broken, but that it doesn't handle all Python use-cases well. Whether it is the fault of APT or Python as to why they might be incompatible in a specific use-case is sort of irrelevant if you end up needing account for said use-case.

> For C libraries there are already good protocols for handling this issue (a little different for RPM and dpkg).

I'm not fully versed in the mechanisms in C, but IIRC, isn't that why packages are renamed when breaking changes happen over major versions (e.g. libxml vs. libxml2 / libgtk2 vs. libgtk3)? "Rename the package" seems more like a work-around than a system to handle the issue at hand. You're not saying, "I want libgtk with version 3.x".


>> Uhm... I don't know why you have a problem with building your own package with the newer version. I do it all the time, for software both others' and mine.

> Maybe things have improved in the past few years, but last time that I built a dpkg for a Python library it was far from painless

"Past few Debian releases", since debhelper 7.x, not "years" (well, this incidentally matches, too). Debhelper can build properly Python using setup.py and doesn't require anything special (no need to specify any paths) to put artifacts where they can be found by /usr/bin/python. It's actually easier than building analogous C code, since you don't even specify where to put what.

> [libxml vs. libxml2 / libgtk2 vs. libgtk3] "Rename the package" seems more like a work-around than a system to handle the issue at hand. You're not saying, "I want libgtk with version 3.x".

For DEBs, yes, this is the case, but you can have several RPMs with the same name but different version.

On the other hand, working with the OS is much easier when the packages differ in their name, so this "workaround" is much more pleasant to use, even though it's not a "pure" solution.


1,2,3,4) You target a specific version of Debian during development and your software will be easily deployable for 5 years without requiring Python-specific knowledge and its dependencies will receive security updates from the distribution.


The biggest problem with apt-get is that the available versions of a given software tend to lag from the actual releases by a fair bit. Moreover, you probably don't want to install a certain package using apt-get and another using pip (not all might be available using apt-get).

easy_install [0] installs remote packages (similar to CPAN for perl). Obligatory reference: Why use pip over easy_install? [1]

[0] https://wiki.python.org/moin/EasyInstall

[1] http://stackoverflow.com/questions/3220404/why-use-pip-over-...


I don't want to depend on upstream for things that may change faster; not everyone uses debian derivatives; system installation means that everything on the box gets the new versions. That's three good reasons not to use the system package manager.


apt is not cross-platform.


OS X, BSD, Android, Fedora, etc. users will just have to use their respective package managers, or use the second option if there is none (downloading the source to a subdirectory). Python code will not run without a platform-specific interpreter anyway.


Those package managers would then have to know a lot about the specific needs of each language runtime they support. They don't and shouldn't. The alternative - each runtime has its own canonical package manager means you get parallel package systems and duplicated effort which is unfortunate. On the other hand, they actually work unlike, say, 'apt for python packages'.


apt also installs packages globally to the system and not every python package exists as a deb package.


Are you from the crowd that thinks that DEB packages only grow on preselected trees and people can't build a DEB themselves for what they need?


If I have to build a deb to install something, I might as well just save myself the time and use pip. The convenience of a package manager goes away when you have to package everything yourself before you can use it. At that point saying 'why doesn't everyone just use apt-get' is the same as saying 'why doesn't everyone just configure ; make ; make install.'

I also don't use Debian.


> The convenience of a package manager goes away when you have to package everything yourself before you can use it.

Far from it. You get repeatable builds and deployments, much easier and faster setup of development environment (GCC and -devel packages and foreign-language tools pulled in automatically), and you don't make intractable mess on your servers, which allows you to use much wider range of tools than architecturally broken Ansible/Salt.

> I also don't use Debian.

Irrelevant. Every Linux distribution and most other unix operating systems use some package manager, most of which in turn are more or less equivalent to APT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: