Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The State of Python Packaging (bernat.tech)
133 points by BerislavLopac on Feb 12, 2019 | hide | past | favorite | 42 comments


It's funny, I used to have a very popular (~2k pageviews a month, 1st or second search result on google) tutorial on how to get a package onto pypi. I deleted it because since then the PyPI team have written a good guide on how to do it: https://packaging.python.org/tutorials/packaging-projects/ . If you're reading this and wondering how to create a python package, follow that. Everything else is going to be wrong, or out-of-date, or worse.


It might have been better to leave the content up, but put a large disclaimer recommending the official guide. Is there any reason you didn’t do this?


Speaking for myself, when I skim an article first, I look for code examples and work my way backwards. I have missed disclaimers before.


Well, and no harm done then. We shouldn't be taking historical content offline just because people have become too inattentive to analyze things top to bottom. The code might even just work still.


Change starts at home.


Personally, I would try very hard to avoid deleting things that I put up on my website, if that's what you were asking.


I wasn't asking anything. The web is ephemeral and the information on that page became superfluous.


Well you see actually Python packaging is extremely simple, people just don’t get it. You just use pip! Well actually distutils is what you use not pip, except don’t use distutils use setuptools instead, distutils is outdated, but not really because setuptools uses distutils internally... but anyways you should use setuptools is what I mean. Anyway you just set up a setup.py file with all the details of your package, except don’t do that, instead use a minimalistic setup.py and write your metadata in setup.cfg, except that’s silly instead put your version in the module, and your description in a file and use hooks in the setup.cfg to get them. Except actually you should maybe use pipenv instead, or poetry actually or what was it this week again? Err so where were we. Ohh yea, python packaging is really simple, people just don’t get it. But wait we haven’t really touched on virtual environments and dependency management yet....

My favorite vote for “Completely break backwards compatibility in python4” is packaging. Tear it all out in one go and completely forget it then start fresh with one and only one way to do it, under completely new names kill pip, setup.py and everything connected to them and let the Phoenix rise again under a new name. Google is a graveyard of the last 6 different “the new way of doing it” and everyone learns by doing it “wrong” 3-4 different ways before they get even close to what’s considered “the right way” this quarter... and then they bikeshed their own “fix” over that and share it on their blog to confuse everyone else. It’s frustrating to no end.

All that and we havn’t even touched on distributing projects that mix in c++/c, or just pure python projects that need to be compiled to an executable pyinstaller works great for that though! Until it doesn’t for some obscure unknown reason you’ll then have to figure out.

Building c++ projects is an absolute breeze compared to the hell that is Python.


I went from Ruby, to Node, to Elixir, and my annoyances with the package/project managers have decreased with every step. I'm honestly surprised that Python's 'approach' is so much worse than any of them. Are there any particular reason why there hasn't been a good package/project manager that became dominant in the Python ecosystem?

To elaborate on my current situation. With Elixir, I run mix new <project-name> and I get 1) a project structure, including test helpers, config stubs, formatter config, gitignore, and a mix.ex file where I can add dependencies. With some flags, I can get a supervised setup (for long-running stuff, basically), or an umbrella app that allows me to put multiple projects under one 'umbrella project'.

Aside from potential issues where the apps in an umbrella rely on different version of the same package, dependency is local to the project and doesn't conflict with another. I also get formatting and test 'tasks', with DocTest support.


I just happen to notice some discussion on Twitter among the Python data scientists side: https://twitter.com/pwang/status/1095173073341505536

I have the impression that conda and specifically how Python, Python packages, and native (e.g. C/C++) libraries interact is still an important problem not yet perfectly solved.


Wouldn't something along the lines of Guix be suitable? It's already used in HPC for pipelines and deals with all dependencies.


It's kind of like suggesting to use system package manager for development package management. It will work, as long as the development material (e.g. a Python library) is available to the package repo, or the package manager is able to use the language-specific repo (e.g. Pypi).

But most system package managers are platform specific, which is not really suitable to be adopted officially by cross-platform languages like Python. I mean, Python can't adopt Guix as "the" Python package manager since Guix has no Windows support.


> It's kind of like suggesting to use system package manager for development package management.

Yes. I have always found it odd to distinguish between "package managers" and "build systems", let alone the (seemingly more recent?) distinction between "system package manager" and "language-specific package manager". These are all doing essentially the same job: run some commands, based on dependencies, to ensure some desired files are in the desired place.

I think the problem stems from Make being so baroque and unmodular/composable: it became widespread (probably via Worse Is Better) and was just about usable for individual applications (where the Makefile and source code are maintained by the same people). Yet it was hopeless for building an OS/distro, since:

- The distro's Makefiles would be maintained by different people to the applications', causing fragility

- Users want a lot of flexibility from their chosen distro, in terms of what gets installed, what versions, etc.

I think of Nix/Guix as "Make done right": their foundation is simple, logical, consistent, modular and composable, which makes them usable for defining everything from a small project's build script, up to whole OS configurations or cloud deployments.

> as long as the development material (e.g. a Python library) is available to the package repo

It depends what you mean by "the package repo". Nix and Guix can import definitions from anywhere, e.g. from files included with a project, or downloaded from a URL, or cloned from a particular revision of a git repo, etc. There just-so-happen to be some quite large repos out there (e.g. "nixpkgs") which define a whole load of packages, as well as utility functions for packaging or overriding certain languages, etc.

> or the package manager is able to use the language-specific repo (e.g. Pypi)

Yes, Nix and Guix can fetch data from arbitrary places (like Pypi) and run arbitrary code (like setup.py) as part of the build. The big package repos already contain reusable functions which do this, e.g. 'fetchPypi', 'buildPythonPackage', etc.

https://github.com/NixOS/nixpkgs/blob/master/doc/languages-f...

> But most system package managers are platform specific, which is not really suitable to be adopted officially by cross-platform languages like Python

Nix and Guix are cross-platform. They have been used to define whole OSes, but they also work standalone on any Linux distro and macOS.

> Guix has no Windows support

I've been following discussions about this, at least on the Nix side. I've not used Windows in decades, but apparently:

- Nix works in "Windows Subsystem for Linux" ( https://github.com/NixOS/nixpkgs/issues/30391 )

- Nix works on Cygwin (not sure what the latest info is, but I came across https://ternaris.com/lab/nix-on-windows.html )

- There's been some work on building/running with mingw, but it seems to be painful due to things like Windows' limitations on path names ( https://github.com/NixOS/nix/issues/1320 )

- Nix and Guix can be used to cross-compile standalone packages, e.g. developing on Linux or Cygwin but building artefacts which will run on Windows without external DLLs, etc.


Yes, but the effort to package the sort of software that's causing issues in python for the Guix/Nix/Spack/Easybuild class of systems is herculean[0] and idiosyncratic.

The groups maintaining and promulgating 'bad' packages would have to relinquish control over some of the dependencies they're currently venturing in or statically linking, or packagers have to reverse engineer their build systems, do that work themselves, and maintain it going forward.

[0]: https://www.youtube.com/watch?v=NSemlYagjIU


It’s mostly solved.

pip install numpy

should work on most platforms.

Nevertheless, there’re problems (that might be confusing to non-Python authors since they are not viewed as problems) like pip not installing system dependencies (e.g. matplotlib doesn’t install freetype).


Conda also solves the multiple versions of Python problem, in addition to SAT-solving the dependency map, and handling binary distributions.


The package management system, is probably the main motivation I always preferred Ruby to Python. Package management on Ruby is so clean, managed by few libraries done right (gem and bundler) that looking at the clusterfk of Python ways to package and manage dependencies I'm getting histeric.


For reference, can you list the most important differences/things done right in Ruby-world packaging as compared with pip?


The problem is that there is NO one good standard to install packages in Python, neither there is one standard to create a package that is promoted by the language developers. Plus, in Ruby with Bundler, what is done by virtualenv in Python (environment isolation) is basically integrated in Bundler itself.


I haven't been in the Ruby world in recent years, but I remember regularly running into issues where I needed multiple versions of a gem, or even gem/rubylang version combinations. I think rvm and gemsets(?) sort of helped, but it was still messy. How has this issue been addressed, assuming it has?

I do rather like how more contemporary package managers just tend to install things locally (Node, Elixir, etc.), and I'm curious how things have evolved in the Ruby (and Python) ecosystems.


When I started learning Rust a year ago, I thought I was going the hard route, but I was surprised how much sense it made.

Especially the whole packaging thing with cargo makes so much sense. Compiling your application on a new machine is a breeze, you just need cargo and the rest happens automatically.

In the meantime I always find myself wondern how I was meant to do dependency managment again in python. Moving projects between machines can be a horror compared to Rust.

pipenv is a step forward, but nowhere near enough. We need something that just works and is part of python.


When you start having multiple incompatible ABIs active at the same time, that's when dependencies get hard. You get automagical solutions that only work for some people because it's hard to see all possible combinations of versions. I think Rust is still too young.


Rust's solution is 'build everything', so you can't have incompatible ABIs. That might not scale, and it means initial build time is pretty appalling, but unlike Python, Rust compiles to a binary, so it's less manky.


For larger projects, I like to use Pants [1]. You just include an installer script in your repo and anyone with a Python installation should be able to build your project on their system with a simple command. Of course, plenty of issues pop up in practice, but it works remarkably well.

[1] https://www.pantsbuild.org/


I have literally never seen a python project organized the way the maintainer does. This does not bode well.


The src directory is becoming a popular option. See https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-str... for an enumeration of the reasons.

Another resource might be https://hynek.me/articles/testing-packaging/


My package has both Python code ("module") and a C extension ("_module"). The C extension code is under src/ following a long Unix tradition.

With the Python-module(s)-under-"src/" option, where do people usually place their C extension code?

It's clear that I could use any name, like "pysrc", and not just "src/" for the Python code. What I'm asking about is if there is a growing consensus about how to organize a package with both Python and C modules.


For C extensions, the most common layout I've seen is to put C code under lib/ (regardless of whether you are using a src/ layout).


Checking my local copies of different packages containing both C and Python:

  matplotlib: src/ contains C code while lib/ contains Python
  cffi: c/ contains C code, there is no lib/
  numpy: contains C code in many places, like numpy/core/src/ ; no lib/
  pyzmq: contains C code mostly under zmq/core/ ; no lib/
  python-cdb: contains C code under src/ ; no lib/
Checking a few other packages:

  pycuda: contains C code under src/ ; no lib/
  Pillow: contains C code under src/ ; no lib/
  tornado: the one C file is under tornado/ ; no lib/ or src/
  Caffe: C++ code under src/ ; no lib/
That is, I haven't found any packages which put C extension code under lib/ .

Which packages are you thinking of?


Ah, my mistake. I know that CPython and matplotlib both have a lib/ directory, but you're right that that's where they keep the Python code and they keep the C code in Modules/ and src/, respectively.

Also I may have confused this with the src/ layout in the past and put my own C extensions into lib/. Thanks for the survey, I think I've been harboring this misconception for a while now.


It's a fairly new paradigm, and I've noticed these things spread slowly in the Python ecosystem (Python 3 was released over 10 years ago). But I use it, and I've started seeing it in some other python packages lately.

Also keep in mind it is only for python packages, not applications--the primary benefit is that it is impossible for the tests to access files that are not part of the package build.


Having One Punch Man as tier 3 invalidates all your points. ;)


He explains the rationale behind his rankings here: https://alexcbecker.net/anime.html#puella-magi-madoka-magica

You can also just scroll down on the ranking page. Well worth reading. He laments that we as a society have lost our appetite for great media. Adroit exploration of a conceptual space, which is what One Punch Man does so incredibly well, is a feature of goodness, but it needs more to elevate it to true greatness. I agree with this.

My position however is that the economics of media production ensure that very little of it will ever be great. Greatness must be driven by vision, and placing mountains of resources at the sole command of one dictatorial visionary will always be a hard sell, and doesn't even ensure that you'll end up with actual greatness. For every Stanley Kubrick you have a dozen Michael Bays.


Best off-topic comment on HN.


I think it's a great fit for applications too.


I think using the src directory is a great idea. The tests can’t just reach over into the package anymore so you have to do an editable install, but the payoffs are big, as noted in the articles posted in a sibling comment.


>> we wont be covering conda or OS specific builds

so 'azure-pipelines.yml' is not operating-system specific ?


Not really. It controls continuous integration builds (or Pipelines in Microsoft parlance) on Azure DevOps. You can test on Linux, macOS, and Windows hosts.


It says (now, at least): “A heads up, I will focus mostly on the Python Packaging Authorities systems (pip, setuptools, so no conda or operating system specific packagers).”


Any thoughts about PEP 582? https://www.python.org/dev/peps/pep-0582/


not a big fan of it myself, it's great for people new to programming, but it's easy to quickly grow out of it :thinking:





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: