> some Linux distributions already switched to version 3
At the lightning pace of only 12 years since the release of python 3 (and 15 months since python 2's deprecation), some linux distribution have ALREADY switched to it.
The developer story around python 2/3 is emblematic of why I don't consider it a suitable language for software development outside of very controlled cases like notebooks.
I dunno, they made some big breaking changes but continued supporting 2.x for over a decade after 3.x. There are definitely reasons I would not consider Python for some projects, but the 2/3 transition is not one of them. To be honest the event is so infamous (and IMO a little overblown) that they basically couldn't do this anymore, and I think it's been openly stated that they wouldn't.
What gets me about it is that the 2/3 transition was just a massive unsolved problem, and one which caused pain over a decade after 3.x was released. Other ecosystems managed to solve it - like Rust for example handles editions in a very sound way - but with Python you had so many problems cropping up over the years over python versions.
There's this tiny, very-very vocal minority of people who continue to mystify the 2/3 changes and act as-if migrating code were some massively difficult herculean task that almost ended Python. Sometimes a little bit of "string/bytes split is ALL WRONG and utter non-sense!" mixed in.
The truth is, that these were not actually big changes. The truth is that some orgs just wanna sit on their lazy asses doing jackshit to maintain their stuff and are surprised that, after 15 years, they have to do MAINTENANCE WORK??? ON THEIR CODE??? This is an outrage! The truth is that if porting was hard, or difficult, the code most likely had no meaningful tests, and so you couldn't do any changes anyway. The truth is, that changes like these expose bad engineering. And guess what, people don't like being exposed.
It's not about the code changes, it's about having to worry about two distinct and incompatible versions of python existing at the same time. So if I type `python` into the terminal, I don't know which programming language it's expecting without knowledge of the system.
This is an especially bad problem with an interpreted language, since changing the system version of python can break code at a distance. I can run a program today and it works, and tomorrow it won't because somebody changed the symlink to the other python version. That's why there's this huge mess of environment management and containerization.
I will give you an example of the problem. When ubuntu finally removed python 2.7 as the default, some of the workflows broke because scripts were depending on it. Scripts that I didn't write, or even know about.
This is only a problem because nobody bothered to solve it. I think one correct solution for example would have been to have both python versions live inside the same executable, with some kind of flag or something added to run against python 3. Just some standardized way of handling the different versions would have made a huge difference and saved a decade of pain.
PEP 394 was accepted in 2012. It says python2 should run Python 2 and python3 should run Python 3. Until 2019 it said python should run Python 2. And minor versions of Python 2 could be incompatible anyway. So the 2 to 3 transition didn't really change anything.
Not depending on the system version of anything is normal advice for other interpreted languages too.
Linux doesn't split shebang arguments. So you can't use env and pass a flag.
> PEP 394 was accepted in 2012. It says python2 should run Python 2 and python3 should run Python 3. Until 2019 it said python should run Python 2.
IIRC, there were a few linux distros that bucked this; at least one (Arch I think, but it wasn't one I used) switch “python” to Python 3 quite early, as I recall, and I think there were some others that did there own thing.
Well, I think some companies had a massive conversion burden, but these are typically massive companies, like Google. Most companies had nowhere near as much python2 code.
I think you hit the nail on the head with "lack of meaningful tests". The entirety of "locked to 2" python code I've dealt with was a serpentine mess with little/no tests and so brittle even going from print to logger risked things breaking.
I think most people rightly saw that the juice wasn't worth the squeeze for a 2/3 upgrade, and that the only reason we were going ahead with it was the sunk costs. If they wanted to create a new language that wasn't backwards compatible, why not make it a totally new and separate project?
This is analogous to saying that if your code was running on IE6 then why bother upgrading it to run on newer versions of Firefox or Chrome.
The only difference between the Python3 migration and other projects' major version changes (PHP5-7, .NET4 to newer versions, etc) is that migrating to Python 3 had the equivalent of a couple of major version's changes all rolled into one. Yes, it's painful; major version migrations are not easy.
But it's really no different from not upgrading JS code to keep up with the latest browser security issues, or not upgrading your JVM (there are _tons_ of Java codebases stuck on ancient JVM versions).
If you had a problem with migrating to Python 3 after a decade you would've still had the same problem with any other major software upgrade in your infra.
But a lot of python code is just scripts. Not everyone is going to go back and migrate every random script they wrote which just renames some files or something just because of "security issues" or something like that.
Also versioning is not an issue, the problem is that Python didn't provide any viable versioning strategy. You just had two separate executables now, and you, as the user, have to decide where to put them in your $PATH and how to make sure the right code gets executed with the right version. That should have been baked into the language or the tooling itself.
> But a lot of python code is just scripts. Not everyone is going to go back and migrate every random script they wrote which just renames some files or something just because of "security issues" or something like that.
That's not really a problem because the Python maintainers conveniently provided a package to do trivial 2-to-3 migrations.
> Also versioning is not an issue, the problem is that Python didn't provide any viable versioning strategy.
There were explicit backporting libraries created to manage the transition. Django depended on these for many years to succesfully support both Python 2 and Python 3 packages.
> You just had two separate executables now, and you, as the user, have to decide where to put them in your $PATH and how to make sure the right code gets executed with the right version. That should have been baked into the language or the tooling itself.
If you're executing a script, Python supports shebang notation to determine your executable.
Sorry but none of these are real problems. The only place where you had migration issues were with very large packages that did complex string manipulation stuff, and that's the sort of code the requires maintenance in any language anyway. It's not like there isn't an enormous amount of precedent in these kinds of migrations (Ruby minor versions break. PHP's 4->5->7 migrations were huge).
> That's not really a problem because the Python maintainers conveniently provided a package to do trivial 2-to-3 migrations.
That's not the point. For example, when ubuntu dropped python 2.x as the default, some of my workflows got broken because of scripts which I didn't write and didn't even know about. There's plenty of old code which people still depend on. System upgrades should not break your projects, or require you to go in and do surgery on scripts you didn't write.
> There were explicit backporting libraries created to manage the transition. Django depended on these for many years to succesfully support both Python 2 and Python 3 packages.
That's a workaround. A solution would have allowed python 2 & 3 to coexist with no additional effort.
> If you're executing a script, Python supports shebang notation to determine your executable.
Again, you're depending on the people who wrote the python code you depend on to handle this in the correct way. It's not something which is built into the ecosystem.
I'm sorry, but all your arguments seem to boil down to the fact that there are ways to make python usable despite the extreme fragility of the toolset.
> For example, when ubuntu dropped python 2.x as the default, some of my workflows got broken because of scripts which I didn't write and didn't even know about.
The only solution to that is neve breaking backward compatibility because either someone might prematurely upgrade (not what happened woth Ubuntu) or someone might not maintain software (what seems to have happened with the third party tools in question, whether it was the original maintainer or some packager or...) and might also not vendor dependencies, on the assumption that external environments will never change.
> A solution would have allowed python 2 & 3 to coexist with no additional effort.
PEP394 allows that, if you depend on a particular python major version use “python2” and “python3” to refer to it: both side by aide installation and continuity of operation over the time “python” switches from 2 to 3 is provided.
While some linux distros broke the recommendation on when to switch “python” targets, that would be transparent to anyone following the recommendation.
> There's plenty of old code which people still depend on. System upgrades should not break your projects, or require you to go in and do surgery on scripts you didn't write.
On Linux land projects that depend on system libraries and `*-dev` packages break constantly across major versions.
> That's a workaround. A solution would have allowed python 2 & 3 to coexist with no additional effort.
You can do that if you make your python interpreter version explicit in your shebang.
> I'm sorry, but all your arguments seem to boil down to the fact that there are ways to make python usable despite the extreme fragility of the toolset.
A toolset that gave you a decade to upgrade with ample warnings is the opposite of fragile.
> MAINTENANCE WORK??? ON THEIR CODE??? This is an outrage!
I usually keep a CI branch with dependencies unpinned (I made pip-chill for this use case) precisely to make sure I'll be the first to know when something coming from the future breaks my code.
Being lazy is a virtue, but not if it cause an engineer to avoid work that needs to be done.
The problem was in part because they actively broke the 2/3 compat story.
Ie someone into Unicode on 2 with u'' got hosed on 3 despite 3 going on and on about importance of string handling. How hard would it have been to support u so someone could support both versions more easily? It was insanity.
> How hard would it have been to support u so someone could support both versions more easily?
Exceptionally hard. Py2 str objects are tantamount to py3 bytes but with the py3 str apis. What this amounts to is every "str" in py2 is an untagged union of str/bytes. Python is exceptionally dynamic. This means if you allow different behavior in different modules, you risk silent, customer-data-corrupting bugs, among other headaches, as things continue to "work" but do the wrong thing.
At least with a hard 2/3 switch, you are on your toes and know there is a (mostly) finite transition period.
from __future__ import str basically guarantees you'll have "3-compatible code" causing headaches years into the future.
That's not even to touch on the difficulty of switching the engineering difficulty of interop of encoding-oblivious strings with unicode ones.
Python 2 let you put a u in front of a string literal to make it unicode. Python 3 made it a syntax error at first. Python 3.3 restored it so people could write code compatible with 2 and 3 without importing unicode_literals.
Is this much worse than downloading some installer and running it? Those can be just as compromised. So can packages in package managers for that matter.
Piping straight to bash can be especially bad if you've cached sudo credentials for the current session - some of these scripts call sudo "inside".
Otoh - the connection is signed (it's https)-unfortunately it's often quite easy to compromise a web site. Obviously, listing gpg signatures on the same page doesn't add much unless it's possible to verify the gpg key some other way.
Ed: another problem is that you really should check exactly what's in you clipboard before pasting to a terminal.
The safety in your steps is reading the script, not in avoiding curl | bash. An installer being signed doesn't guarantee it's not malcious; if someone has overtaken a host and replaced the binaries, they'll just sign them themselves. Unless you're manually inspecting the signature matches your expected source, running a signed binary doesn't save you.
True, it does not. I don't recommend downloading (random) binary installers and running them either.
With eg Linux isos, you typically already trust the signing key for your os updates.
But unless you are vigilant about your ssl root certs, you'll easily allow a lot of malicious and incompetent services to potentially intercept most of your ssl traffic... (due to there being many trusted roots by default).
> if someone has overtaken a host and replaced the binaries
This again depend on who and how the binaries are signed, and how the signatures are trusted. Typical windows (and Mac?) setups will gobble up any signature. But if you do check who signs the binaries - then the signing key will easily be the most secure part of the system - a compromised ftp/web site allow hosting malicious binaries, but typically not grant access to the signing key.
With letsencrypt a hacked web site will typically have access to a valid ssl cert - no need to further compromise mx/mail records or gain access to a business phone number etc.
A ascii-armor signed shell script can be distributed safely via a paste-bin. Unfortunately there's no good automatic/standard way to do so. Or rather no standard tool to prompt to trust the signing key - and then run the script - beyond basic gpg --search-key --key-server.. + gpgv.
Maybe signed git repos would be easiest - but I don't know how easy it is to limit which keys are trusted - if it's possible at all?
The helm project does a little dance to try and verify downloads - but for all the effort it pretty much amounts to trusting the script, not the keys/signatures:
I was hopeful sequoia might help - but apparently its sqv tool is even worse than gpgv - neither can handle an ascii armored public key, and sqv can only handle detached signatures.
> The safety in your steps is reading the script, not in avoiding curl | bash
Well, yes. The safety is in doing something between "acquire potentially malicious payload" and "running payload". I don't see how "safety [is] not in avoiding curl | bash" when, avoiding the direct pipe to bash is exactly what I suggest.
If you look at the url, then curl and pipe that url, you have no idea if bash sees what you just reviewed.
But you have exactly the same problem with downloading a binary, or running pip install. You have no idea what that code does, so curl | bash doesnt hurt any more than any other normal methods of installation.
Do you read the source of every setup.py you run before running pip install? Also, if you are untrusting of the source enough to verify their install script is safe, why would you install their template to run on your machine without verifying all of that too? Finally, 10 line bash script might (as tbis example does) just call out to another curl | bash, or to a pip install/npm install.
> Do you read the source of every setup.py you run before running pip install?
I generally run make, setup.py, cargo build etc in the context of cloning a source repository. I certainly could do a better job of sanity-checking those things, but I do try. And I definitely try to avoid having sudo credentials cached when I do - to foil "sudo cp artifact /usr/sbin" and other awful things people do, because they found it convenient.
> Also, if you are untrusting of the source enough to verify their install script is safe, why would you install their template to run on your machine without verifying all of that too?
I generally trust people more to write "left pad" than install scripts. Many sysadmins are good programmers, few programmers are even remotely decent sysadmins in my experience.
> Finally, 10 line bash script might (as tbis example does) just call out to another curl | bash
In which case one has to chase down the rabbit, or give up.
Sometimes one will discover that the end game was downloading a gpg signed tar archive with the release artifacts - and one can go and do that.
> or to a pip install/npm install.
People do do awful stuff in makefiles and package install scripts, but for vanilla python/Javascript - the lazyness of programmers tend to work to our advantage - there be little extra madness/magic in there.
Sute, running pip install -r requirements.txt can do almost anything - but it's unlikely to run your package manager under sudo and mess up your system packages, or add something questionable to your package sources.
> Is this much worse than downloading some installer and running it?
Yes.
You should inspect what you download.
Also, you should probably use the Python interpreters provided by your Linux distro, that stay in directories you usually can't write to and come in signed packages. On a Mac, the next best thing would be MacPorts.
Nothing here overrides there system wide Python version.
The article specificially goes into why not to do it.
> pyenv allows us to set up any version of Python as a global Python interpreter but we are not going to do that. There can be some scripts or other programs that rely on the default interpreter, so we don’t want to mess that up.
Using the Python interpreters in your system doesn't mean you can't make virtualenvs out of them - it's just that they are precompiled and well supported on your specific OS.
I used custom Python interpreters a lot and it's nice to be able to rely on the system to provide a sensible environment instead of forcing myself to build my own.
That's why the traditional Windows way of downloading a setup.exe and running it with admin privileges is a bit scary for people coming from other platforms. Installing an .msi is less bad, or so we are taught.
I think it'll compile various Pythons on your machine under your user. I'd prefer to install (learned this today) with Homebrew multiple versions (not sure how possible it is) as `brew install python@3.6 python@3.7 python@3.9` (because Big Sur has 3.8 built-in).
In reality, I'm a more traditional Unix person and prefer MacPorts, where you can do `sudo port install python36 python37 python39` in a very BSD way of doing things.
Homebrew has broken my computer one time too many.
The script is served over https, so it's not going to be tanpwred with (unless you have a malicious cert, but at that point you can't trust anyone), and curl | bash isn't any worse than downloading a script and just running it, or running a precompiled binary you don't trust.
pyenv could get taken over and you won't know. It's also possible to detect when someone is piping to bash (on the server) and serve a different payload [0]. You're better off piping curl to a file, reviewing the file and then running it manually.
Yes, there absolutely should be. It would be a massive improvement if that happened.
It requires a few extra steps to be actually secure. You actually need to verify the hash from a trusted source for it to be actually secure. If the delivery has been tampered with, you need to ensure that the delivery of the hash has also not been tampered with. In practice, codesigning is the solution, but certs are expensive, and impractical for a small project.
How about the hash being something that you calculate locally?
1. (local) Download the file from the URL.
2. (local) Review it locally, in a text editor.
3. (local) Get its hash locally, from the file in your file system.
4. (SSH) Feed this hash into the fictional tool above.
5. (SSH) If what curl gets is the same as the file that you've reviewed, it gets piped further into bash, otherwise the execution stops and an error is output.
Of course, that's only applicable to this particular case, where a compromised server could detect that a bash pipe is used and return different file contents. That would only be useful in situations where you want to review it on a local device, such as a desktop and run it on a remote one, such as a server.
Edit: If you want to review it remotely, there's nothing to prevent you from using less or something to view it before manually opening it with Bash. That just requires the discipline to not use one liners that both download and run it, as long as no such tool like the above exisdts.
I cant believe I'm going to suggest a blockchain but I think what you really want is:
- run `cu-sh example.com/questionable.`
- this uses `$editor` to let you review the contents (skippable with a command line flag)
- generate a hash of your local contents
- check said hash against a blockchain to see if everyone else who got it got the same contents as you.
- decide from 1 and 2 above whether you actually want to proceed with the install.
You could replace blockchain with checking if it's signed, and the key matches an owner on keybase/github/some other federated identity provider too.
You often want to do this anyway, because the installer often supports various options and env vars. If you download the file you can read its --help output, and even keep it on hand in case something bad happens, or just for your own records.
It's also possible for you to copy things you can't see from web pages. So the command(s) you end up with may not be what you thought. So there's a trust issue with the site you get instructions from ass well.
If you're afraid the host may be untrusted then you would be wrong to download any of their code at all.
The safety is in reviewing the code there, not in avoiding curl | bash. Running pip install or npm install is just as dangerous.
> They should offer a download with signature validation instead. Signed by Apple, Microsoft, etc if possible.
If the host is compromised, the attacker will just get Microsoft to sign their malware instead; see [0]. If the host is compromised, and you run the code without reviwing it, you're hosed regardless.
With Python, the only sane choice is probably through Docker images :-(
This is the reason why I consciously attempt to move away from Python and choose Go or Rust for new projects if possible. Of course, on existing projects, Python deployment is a pain.
> the only sane choice is probably through Docker images
Its really not. You either tailor your dependencies to match the host (clue, you should be doing this anyway, it makes things much easier in the long run), or use virtual envs.
pyenv also gives you some (non obvious) flexibility as well
However, having said that, the chances are, your go or rust binary is going to be in a docker env as well(so is your python), so its basically all the same.
I'm personally a big fan of (especially) the most recent builds of Python 3, but lately I've been interested in learning Go or Rust just because there's a few things I really like about them both vs some other languages I've looked at.
So, seein' that you've coded in both Go and Rust apparently, my question to you is: Which do you prefer of the two (and why)? I personally lean a bit toward Go, but I haven't learned enough of either to decide absolutely which of the two I should learn first.
I think it is better to not _decide_ which one to learn. They both have their perks and quirks. If you can, I’d recommend to dip your toes on both. For network services, Go feels super nice because of the high quality libraries in its ecosystem. I feel like it is better Java.
Rust on the hand, feels like a better C++... with a steep learning curve, and useful when you can/need to get something correct/efficient/safe at the cost of increased developer time and cognitive load.
Nice. Thank you for the info. Do they both have pretty good GUI toolkit support available? I currently use PyQt5 in Python, but I'm not totally against GTK if that's what is easier for Go or Rust. Still think I kinda feel like Go's the one of the two I'm prolly gonna learn first, although you've got me leanin' hard toward also learning Rust in addition to Go.
Desktop/local usage, especially for non technical users or people working in environments where they can’t install docker (eg locked down corporate machine). There isn’t a great way to pass around a single executable or installer for a Python app. There’s pyinstaller, but it’s finicky and doesn’t quite work cross platform (you have to build the executable on the target OS).
On the other hand, OP’s setup makes it very easy to publish packages. So you can create a tarball and have a user pip install that, then run your app with a simple CLI. Or publish to pypi if you want it public. The downside is you assume the user has the right version of Python and knows how to switch versions if need be and all the weirdness that comes with that. But for web apps, packaging your app also makes it easy to wrap in a simple dockerfile that basically just installs the package and then runs it.
Yes, but if you sign the executable with an acceptable certificate (the company’s cert or some other trusted CA), you can install an arbitrary executable usually. Docker would typically be blacklisted, however. In most corporate IT environments, it’s much easier to get a small, focused executable white listed or code signed than to get something with such a huge attack area as docker approved for regular, non-technical users.
Personally I have not. I have only had one particularly weird use case where I really needed to deliver a Python executable alongside an Electron app, and made it work with Pyinstaller. We’re actually reworking things to use Pyodide to compile the Python to WASM so we don’t have to deliver any separate Python code at all. It’s a real Rube Goldberg machine of an application, but we had a lot of weird constraints we had to meet (eg, we couldn’t use sockets to communicate between Python and Node, so had to package the Python as an executable). It works beautifully but it was a mess to get running.
Docker was made for development and testing of code, not production. It has been co-opted for production with good success. It seems fine to do it for hobbyist projects, but still feels dirty in a commercial setting.
The largest internet services on the planet were using LXC containers for production ops before you ever heard of the term 'docker'. In its earliest iterations it was easier to push a container to production than to run it on your laptop.
From what I recall, spotify were the first company with a large footprint to use docker. However for some reason they skipped VMs and went screaming into docker when it was _very_ new. Personally that seemed like a mistake, but you know, each to their own.
If you're wanting to get into a bun fight about containers, then IBM 360 and JCL has some time for you.
I'm using docker in commercial environments for years and cannot complain... But I don't distribute software, which is where the problem is probably more visible.
If you're deploying a REST API with authentication, caching and so on then the KISS solution is python with a framework (such as Django) in a docker container.
I've spent many many hours (years) trying nix, rust, Haskell, go, spring framework and all sorts of other things which are a lot of fun but not so good for getting shit done.
For other domains this doesn't apply of course; lower-level network stuff, portable CLIs, CPU-intensive workloads and so on are much better in go/rust but you can either integrate them with a network call, spawning an OS process or an FFI in the case of rust.
I'm using Pycharm Pro and it can use Docker for code completion. I use Docker AND venv on host filesystem for friends using Pycharm Community and VSCode. Is this still necessary?
Heroku, yes I would push for that. Its expensive, but it'll save you an entire devop function right up til you hit the 30 people mark.
Elastic beanstalk is just a horrid dev environment. Lots of waiting, lots of non-obvious options, and very little reward. I would personally push for lambda and zappa (https://github.com/zappa/Zappa) for python, as it seems to be much easier to deploy and debug.
If this is a company project, a REST API or task service of some kind, then you are probably using Docker in the year 2021.
If this is anything else, or if you work for a shop that hasn't embraced containerization, then you use PyInstaller (http://www.pyinstaller.org) to bundle your application. Either into a directory that contains your full Python virtual environment (only 5-10 megs!), or into a single executable file.
The latter is most convenient for a Go/Rust type experience. But the former will startup faster, because that single-file executable has to first uncompress itself to the system temp directory.
If your Python code has native code under the hood (eg. Numpy) this two-step is very much a 'step1: circle, step 2: draw the rest of owl' description of production deployment.
If you're deploying to a server you'll want to pip install dependencies on the server. If you're using docker you'll want to pip install dependencies in your container. If you're deploying to an end users computer you'll want to use pyinstaller, which admittedly is not trivial to get working in all cases.
This is exactly my point. .venv/ needs to live in the project deployment directory, and needs a wrapper script to use the python binary/symlink therein.
Isolate the project from the python runtime it uses, and you'll always have the right set of packages installed.
None of this is perfect, but an in-tree .venv/ and convenience scripts seems to be the least-worst option.
I meant to say "I look to see how they are going to deploy and use the virtualenv they have created."
Hint: what works in development -- just telling people to type "source .venv/bin/activate.sh" or such, doesn't fly in an unattended environment.
All that it requires, of course, is a bin/venv-python wrapper (bash) script to reference the created .venv/ directory, so this is hardly ground-breaking stuff, but as I mentioned originally, this (crucial) section is missed every time.
I use a minimal-but-complete pairing of venv and pip, and a couple of location-independent wrapper scripts, and I can run things the same across all environments.
These are pretty solid recommendations overall (pytest, poetry, pyenv). I'd guess the most controversial things would be pre-commit hooks (please no) and using a force-fed autoformatter (DEAR GOD NO).
pre-commit is actually a very solid linter task runner framework, I would still recommend it even if you don’t like Git pre-commit hooks (just omit the `pre-commit install` part); the `pre-commit run` command is still very useful invoked directly (optionally with --all-files and passing a specific hook id to run).
This is how the internet works now. If you ask a question its rare to get a good answer. If you post a faulty opinion as advice you're more likely get a good answer.
These are fairly close to our working environment (we're using pylint instead of Flake8, haven't switched most of our projects to poetry yet, and we use Pedantic but not MyPy yet.)
I think you mean Flake8 rather than Sense8? Anyway, gives me a chance to plug Flake8 Alphabetize, a Flake8 plugin for import ordering https://github.com/tlocke/flake8-alphabetize
Flake8 Alphabetize will just give you warnings about import order, whereas isort will actually change the code itself.
So as I see it there are two types of tool, formatters and checkers. A formatter doesn't alter the Abstract Syntax Tree (AST). In other words it doesn't alter the meaning of the code, just how it looks (eg. the formatter Black). A checker on the other hand looks only at the AST and just gives warnings in your editor, and then it's up to you to change it or not.
The import order is fixed in the AST, so it falls outside the scope of Black (which never changes the AST). So I felt there was a need for a tool that worked as a checker that just gives warnings in your editor if your imports don't conform to PEP8, and hence Flake8 Alphabetize.
It's worth mentioning that Flake8 Alphabetize follows Black's philosophy of having only one way of doing things, so a project can standardise on Flake8 Alphabetize and everyone's imports will look the same.
The checker/formatter distinction makes a lot of sense. Given the choice between the two for a given goal or set of operations, e.g. alphabetize/isort, I'm not sure why I'd settle for a checker. If the fix is deterministic and objective, why would I want to make the fix myself, rather than let the tool handle it?
My feeling is that I want anything that could change the meaning of the code to be down to me making a manual edit. So I'm quite happy for the Black formatter to automatically rewrite my code as it never changes its meaning. For anything else I like to be in the driving seat.
Anyway, that's just my feeling at the moment, I'm sure others have a different take.
The lack of discussion of lock files is a big omission in any 'best practices' page. "Lock" only appears in a log output of Poetry.
Even if the answer is "Poetry handles it" you certainly want to explain why they're important just like is being done in the rest of the "Why use..." sections.
Huge +1 for this; pip-compile is so much faster and less surprising than poetry. Poetry's #1 footgun is that it updates the version of unrelated dependencies when you add a new dependency. No thanks.
Conda is one of the most annoying things with Python imo. Or, it's just another symptom of the crazy dependency hell of python, but it's actively making it worse. Now every project is a mashup of conda and pip and probably more.
Just right now I'm trying to fire up a new instance on GCP. With a completely clean image, doing a conda install hangs for 30 minutes while it's trying to "solve" something.
Quite confused about this comment as many of the latest tensorflow V2 releases aren't reliably uploaded to conda forge. IIRC there's only like 4 or so of the V2 releases uploaded.
You can conda install the CUDA dependencies and then install the required Tensorflow version via conda pip. But that's not much different to installing CUDA manually and then installing tf from system pip.
It's much faster and easier to pull a tf Docker image as it's their "officially supported" way to get up and running.
So... as a tf user and a sysadmin... Nah. No conda for me thanks.
One project uses tf 2.1, one uses 1.15 and the other 2.4 ... for me it is much more convenient to have 3 envs rather than 3 containers or switching the system cuda as needed...
I especially had problems debugging through docker containers back in the days, therefore i never picked it up again.
Conda has been a lifesaver for me in the past, but it got so slow in ~2019 (minutes+ to resolve dependencies) that I've switched back to pip whenever possible. Maybe things have been resolved now though?
> Conda has been a lifesaver for me in the past, but it got so slow in ~2019
This is why mamba [0] was created. It is a C++ reimplementation of conda for much better performance. mamba is a drop-in replacement of conda and can operate on the same anaconda, condaforge (and mambaforge) repositories.
I do have to try mamba sometime but I feel like there is something more than python slowness going on.
I use Gentoo and its package manager is written in python. Even though it is more complex (IMO) it doesn’t have nearly the same slowness when it comes to dependency resolution and conflict detection.
Yeah, now all the "hip" devs are driving things towards "poetry".
It's very disillusioning to see how sheer twitter-followings and "popularity" type metrics drive development these days by forcing alternatives to be de-facto neglected. Everyone does what's "hot", so all the tutorials and bug reports and tests and SO questions and new libraries and and and all go towards that framework or language or tool or method. You can't even argue technical merits towards the neglected options because yes the popular tool is better, but only because we have a metric boat load (millions) of man-hours being pumped into making it better instead of all the alternatives. It's like the tech-equivalent of fashion fads in that it's self-reinforcing. Not to take away from some of the actual and technical achievements that some of these things have made, of course.
Great recommendations all around. It'd be nice if they went into how they handle dealing with C extensions, as that adds a large amount of complexity to a Python project, and most of the published advice on the topic is quite old.
EDIT: Mentioning `pyproject.toml` and the relevant PEPs would also be great (i.e. PEP 621, PEP 517, etc.). Fortunately Poetry is compliant with these PEPs.
I really struggle with this part of Python package development. I have a rather complex C++ library I have been calling with Cython but I am struggling to find the best practices for either compiling the library with setup tools or calling cmake and copying the library into the package etc.
That's a great resource in general for Python extensions, but doesn't have much to say about packaging and distributing them. For that the best resources I've found have been to look at what large/complex projects that use them do, as they've often had to deal with many of the odd cases that often come up.
Using Cython adds another layer of complexity to the packaging/distribution that that link does not address. Fortunately now that you can specify build requirements in `pyproject.toml` Cython has become significantly easier to use on that front, but there are still some less than obvious bits to say the least.
Maybe I should publish an overview of Python best practices for publishing C extensions (with or without Cython).
I prefer containers over pyenv and poetry. This way not only python version and dependencies are "in one place" but also all other stuff that comes along with a new project. The OS, the database etc.
The one thing I dislike about Python projects is that Python plasters the compile cache files all over the place. Is there a reason to change that? Currently I use the -B flag for all my scripts. But that makes it slow. I wish Python would have an option to perform like PHP and keep cached compilations in memory instead on disk. Or at least somewhere in /tmp/.
Pip is fine, it depends on your goals. I've found requirements.txt less enjoyable to maintain for several reasons – you need to separate dev, test dependencies on your own time, there's no notion of a lockfile for transitive dependencies (`pip freeze` notably doesn't separate actual dependencies from transitive dependencies). pip is also darn slow at installing dependencies once you hit a certain scale, and poetry outperforms it pretty substantially.
Poetry does I expect a package manager to do, and does it well, especially when working with a team of developers on an application versus individually. There's not a compelling reason for me to use pip directly as a less functional alternative.
Bit rot, "it works on my machine"-style issues, cache misses on dependency installation (which can really bloat deploy times in deploy pipelines by busting Docker caches across machines, too). Can be a security issue if a vulnerable library version is pushed and one installs it as a consequence of having non-locked dependencies, especially in python where package install scripts have a lot of power.
Lock files help solve for these. You can build software without solving them, but it makes my life easier.
All of this. Plus picking up a legacy project from someone with a giant requirements file and then trying to pick through and work out what we actually want locked and what's been installed by something deep in a dependency tree is a nightmare. Even if you don't use poetry for your own sake, use it for everyone else's.
Good question! From a template repo commit at work[1]:
Advantages:
- Separates development and production dependencies.
- The dependency version is specified separately from the lock file. In
practice this means that the version in pyproject.toml generally only
needs to be set to anything other than asterisk if and when it becomes
necessary to use a specific version range.
- The lock file includes SHA-256 checksums by default, and these are
checked during installation.
Disadvantages:
- More complex configuration than Pip.
- Python package managers come and go, and this one is likely going to
suffer the same fate eventually.
- Introduces poetry.toml simply to specify that the virtualenv should be
in the project directory. The default is to put virtualenvs in
~/.poetry, which is a non-standard location and therefore might
interfere with typical IDE setups, mounting the virtualenv in containers
or VMs, and the like.
> The dependency version is specified separately from the lock file.
That. The simple fact that a Pip file mixes both the packages you want and the dependencies required by this package, is a valid reason to switch to Poetry IMO.
Yeah, I don't think I've created a single venv in the last 2 years. I don't need them for basic stuff (e.g. a quick script) and for anything more I'd rather have a container so I can deal with all dependencies in one place and use it elsewhere quicker if I need to.
The link describes the attack vector. pipenv locks the dependencies using hash. if you company has my-company-py-lib then pip could install public library that pretends to be internal.
A Docker container starts in two seconds or so. And gives me everything I need. So no need to dabble with a VM.
There is not much overhead in running a project in a container. The project has a setup file that turns a fresh Debian 10 into whatever environment it needs. And thats it. Run that setup script in your Dockerfile to create a container and you are all set. Want to run the project in a VM or on bare metal? Just install Debian 10, run the setup script and you all set.
> Also at what point do people just realize that all of this overhead is a gigantic waste of time and just use a better language?
Probably some time shortly after your developer time costs less than your cloud compute time. Until you hit that point (if ever) there are few options as cost-effective as Python.
In any language ever if you use non-vendored shared libs you will hit this problem. Certainly not specific to Python, in fact the reason package managers on *nix are necessary (and not just a nice to have) is because of this.
Yeah to be honest if you need containers to make a project reproducible this is just a sign of failure. You're basically saying you need to encapsulate the entire system for your code to run correctly.
Loads of tools have external dependencies that are hard dependencies. And library search paths is my machine vary from those on prod... I could go on, but I'm not sure I understand why using a container to manage all that is a failure?
There are plenty of reasons why you might want to containerize a project. If you have a lot of system dependencies, for example, you might want to consider including a Dockerfile in your project to make it portable.
However what makes Python a failure is that people feel they need this to dependably run a python program which only has pure-python dependencies.
Compare this to a language like Rust, or the NPM ecosystem. In those cases, the tools have managed to dependably encapsulate projects such that you only need the package manager to make a project fully repeatable.
With either of those ecosystems, there's basically one system dependency, and you can find any repository online and dependably do `git clone ...` then `cargo build` etc. to make it work. With Python, you effectively have to reproduce the original developer's system, and that is a failure.
Huh? Either something is really weird about your env or we have different ideas about what counts as a pure Python package.
Because if you don’t rely on Python packages with extensions that farm out to external libs it’s as easy as git clone, pyenv virtualenv, pip install -r, and python -m build.
The think that makes this worse than other ecosystems is:
1. virtualenv shouldn't be necessary. This is more or less the same concept as containerization. This is only needed because python has a fractured ecosystem, and setting up your environment for one project can break another.
2. you also have to know which environment encapsulation and package management solution the library author is using - this is not standardized
1. Virtualenv is essentially the same as node_modules, yet everyone rants and raves and loves that. And the kind of breakage you're talking about is astronomically rare in my experience.
2. No you don't - what makes you say that?
virtualenv is so much less user friendly than npm. Like why do I have to run a `source` command to make virtualenv work? I don't use either often, but I can remember how to use npm if I haven't used it in like 6 months, but I have to look up the right commands virtualenv if I haven't used it for like 2 weeks.
In the article the author mention that VS Code doesn't support Poetry venv. This can be solve by configuring Poetry to create the virtualenvironment directly in the project folder (which in my opinion is neater).
One thing I'd add herr here is Pydantic. It compliments the type hinting very nicely and lets you get more static with your classes. But that you don't just install, you need to use. Which is another omission, I suppose; python's type hints are opt in, so mypy won't do you any good unless you actually add type hints, right?
There are a bunch of mypy configuration options to disallow certain things like --disallow-untyped-defs. There’s also --strict which forces typing on everything.
Man, developing python with pydantic and "mypy --strict" (I follow pydantic's config [0] where I can) is such Type 2 Fun. It feels like a totally different language. Yeah it takes a little more time at first but then type inference and autocomplete starts to kick in and then you're screaming fast. And you "compile" it and everything just works. No hunting down edge cases or tracebacks cause you forgot to catch a None. I find it super satisfying. Much easier to stay in flow state when you aren't having to stop every few minutes to test stuff and dig through tracebacks.
The pain and suffering I see with different versions of Python installed with Pyenv in my team and surrounding ones is huge. I advise anyone to just stay away from it. On Macs, you can install MacPorts and it'll bring in every version of Python, from ancient to newest, in a sane BSD-like way (not in a user-writable folder, ffs). On Linux the default package managers offer pretty much every version of Python that's not outright dangerous to run. On Windows there are binary installers and every Python goes in a separate folder.
Not only it's not needed, it creates a layer of "magic" the user has to understand on top of the environment.
And, if you really want to use brew, you can still continue using it, knowing it only has python 3.9 at the moment. Plus, it can coexist mostly peacefully with Macports.
Interesting. Thank you for correcting me. Can you install multiple versions this way? My colleagues are suffering with that and I'd prefer not to move them from Homebrew if avoidable.
For sure, yes, they're distinctly named formulae and thus subject to their own install, remove, and version bump lifecycle management
IIRC the trick is that the "at" versions don't get "brew link"-ed by default, since they'd almost certainly smash on top of the non-at binaries or manpages or whatever. Using `PATH=$(brew --prefix "python@3.7")/bin:$PATH` or ones favorite context switching gizmo will help, or (with the python ones specifically) using `$(brew --prefix "python@3.7")/bin/python -m venv ...` is a great way to avoid having a lot of special env-var silliness
Great recommendations! Exactly my setup, except add pipx for global system tools, i.e. command line tools distributed through pypy (youtube-dl, black, etc.).
I also add Docker or Songularity containers as needed for deep learning deployment but that is quite computing platform and application specific.
Yes, indeed I mean the Python Package Index (PyPI). Mypy, pypy, pypi, pyenv, pipenv, pip, pipx. Easily confusable names are a core tenet of the Python ecosystem!
That is a great write-up! One extra bit I'd recommend to this list is using https://github.com/ikamensh/flynt to convert string format into f-strings. It requires Python 3.6.
venv + requirements.txt is simple and works. Why are people so obsessed with replacing this simple toolset/process with poorly maintained tools like Poetry/PyEnv?
Please help me understand why I would want to lock specific versions of libraries in my Python projects. I’ve worked extensively with Python for a number of years and have never needed a lock file for my dependencies.
There should be an article like this for every language, as the author mentions it's not just syntax to change a language. Much of it is experience based knowledge.
However, this is how you would setup for a new project only and in a piecemeal manner, if you didn't already have some existing template.
Compatibility issues with your main dev tool would rule out using Poetry, in the same way running the latest versions of any software without reason is a rookie mistake. Chasing a higher version number is a jr / intermediate folley.
If you're developing python professionally, save your money and just pay for pycharm. The morning of fluffing with a dozen new tools which then have their own maintanence overheads is less cost effective than buying a product that gives you >70% of what the author has recommended and it does so in a generally consistent manner.
This blog post was a great reminder of how much pycharm gives me on a day to day basis, how people get lost in their +1 more tool mindset and how switching languages is an intial cognitive overload.
Why? The year in particular seems very relevant because the "recommended Python project setup" (if there ever was such a thing) changes every other year or so.
The current work project[1] has all of these, with one substitution: Pyenv, Poetry, Pytest, pytest-cov with 100% branch coverage, pre-commit, Pylint rather than Flake8, Black, mypy (with a stricter configuration than recommended here), and finally isort. These are all super helpful.
There's also a simpler template repo[2] with almost all of these.
I replaced all the pyenv (and any version management in my projects) by the awesome VS Code remote development with containers. This is a game changer to me. 1 container = 1 clean installation, completely isolated, and you can share with your teammates.
Then I disable the use of virtual environments in Poetry, because it's a useless overhead.
Then for updates, I just change versions in my Dockerfile or .toml and rebuild the container from scratch in a few time, which is cleaner than manual updates for everything IMO.
Why use the external pre-commit repos instead of pointing to the poetry-managed local installations for flake8, black, isort, etc? The repos approach results in a much simpler and cleaner pre-commit config, but since you end up with two pinned versions of each, managed by different updaters, it can invite drift which can be very hard to track down.
I'd prefer to only use the pre-commit versions of these libs, but then I'd sacrifice editor integration.
What is almost never mentioned is that the default virtual env will always set the python binary it is called from as the python interpreter for that environment.
So if I want to use Python3.9 i go python3.9 -m venv .venv, if I want to use python3.6 i go python3.6 -m venv .venv etc.
Yes, lockfiles are great, but while everyone can execute the above commands, it is a lot harder to convince everyone in my organization to switch to poetry.
> Some people work with code from 13" laptops. And viewing diff of two files of 79 char per line max becomes very convenient.
I think it's not the only reason. Limiting columns forces developers to read more vertically, which is less tiring for eyes. For example, try to read a book on a 27' monitor with no max width. Going to the next line will quickly become painful.
Containers are great for web services or anything that primarily communicates through a port.
Less so for CLI style projects that read and write to the local file system. Yes I know you can bind mount directories but it’s clunky and you also usually have to fight with file permissions issues.
But literally everything else is optional or completely insane. This is not "best practice," its the entire kitchen sink. You might use a linter, or you might just try to write clean code. You might use Poetry, but frankly, the default pip installer is fine for most people most of the time. Installing and configuring something to sort imports for you is just silly, and pre-commit hooks are a terrible idea that just get in the way.
My recommendation would be to avoid adding tooling until you have a problem it solves, and focus on building something useful instead. I imagine that's true of most languages.
I think a big part of the problem is that bundler has been around for ages whereas half of the tools mentioned here didn't even exist just 5-6 years ago.
Which can be seen by this hilarious sentence:
> By default, Python packages are installed with pip install. In reality nobody uses it this way. It installs all your dependencies into one version of Python interpreter which messes up dependencies.
Why hilarious? Because everyone I know uses `pip install` if they're not using poetry. And because poetry is kinda new, that's a lot of people.
The quote isn't clear, but I think the author means people don't generally run pip with their system python. I.e. they don't run /usr/bin/pip or whatever, but use some virtual environment solution.
gem catches this by default and warns you. Pip doesn't, but it would probably be good if it did and the pip guys explicitly took the position that "installing stuff on your system python is not a normal use of it".
What poetry replaces is pip and requirements.txt. Our team had a lively discussion about this, but here are some good reasons:
* Keep test frameworks out of production (in case they've busted something; the two points below can happen to test libraries too.)
* pip recently changed the way it's resolver works[1], breaking numerous projects. Yeah yeah, it's a major version bump, but lots of containers and environments installed pip latest just sort of assuming that it would always handle requirements.txt the same way. With that contract broken, now using pip directly isn't a non-decision anymore.
* Locking specific versions of dependencies can you roll back Bad News in production. Let's say you have a project with library A which has dependency B. If A asks for the latest version of B, you can be in a situation where a new version of B breaks your project and you won't know about it until you do a release _and_ if you try to roll back you'll still be in trouble. We recently had to deal with a similar issue and had to fall back on an older container until we could figure it out.
* if your code is under src/ and your tests under test/, you use a requirements-test (or better, tox.ini) your tests including the dependencies like pytest dont end up in a wheel
* if a container depends on 'latest' and not some semver major number, it's 100% the containers fault when they blindly update to a new major version
Agreed, Pip isn't to blame, treating `pip install -r requirements.txt` as one's sole dependency management is.
You can choose to build your own dependency management practice around pip, or use one someone else has already created. I think that it is easier to get a team on the same page with something like Poetry, especially if they're used to bundler or npm.
You're right about the docker containers as well, but upstream does what it wants and downstream has a strong tendency to not mess with upstream's choices.
Indeed, why Poetry? Poetry At first glance looks and feels good (and I even used it for a while) until you hit a weird bug and find that’s it’s one of the 989 open issues https://github.com/python-poetry/poetry/issues
So it’s back to plain old venv + requirements.txt for me
Poetry has lock files, so installs are always the exact same version of everything.
I've found this is almost great, but when it breaks down, perhaps because two packages request a transitive dependency in different ways, or if the resolver really wants to install a new version of something that won't compile on your system, it's a giant pain and sometimes impossible to work around.
These days I just use pip with venv unless I have a good reason not to.
I wonder why VSCode is at least name-dropped in these Python setup tutorials lately. VSCode is spying on you and sending back "telemetry" data.
Given the current situation in Python, where there is little development, the old boys have totalitarian control, and new contributors are smart enough to avoid that mess:
Try out Rust, Go, Elixir or Lua instead. It might save you a lot of trouble. Heck, if you are willing to put in a lot of time to create carefully written objects, C++11 code can look a lot like Python (if you are into that).
They've invested a ton into Python support in VSCode in recent years. It's really good.
The language server Microsoft built for type checking and completion (https://github.com/microsoft/pyright) is excellent. It makes Python feel like a first-class statically typed language, which isn't something I was able to replicate in Pycharm (tried this very recently). Meanwhile, it's just baked into VSCode, no configuration needed.
I use Jetbrains tools for several languages, but for Python and frontend (Typescript with React/Vue/Angular) VSCode is hitting the perfect notes.
I haven't touched Python in years, and just recently came back to it. I was very pleasantly surprised to see just how much Microsoft has improved the tooling in VSCode with the recent update (https://visualstudiomagazine.com/articles/2021/05/11/vscode-...). I've only tinkered with it a bit, but it feels on par with the JS/TS experience in VSCode, which is surprising since those are native and have a more robust static types community.
My experience is also that PyCharm is much more pleasureable to work with and on a safer side. I checked 1.5 year ago, so could be changed, for remote development on a server and had to download some 3rd party Rsync tools to be able to do so. For PyCharm is just works out of the box with a professional license.
VS Code experience has definitely improved a LOT in the past 1.5 years with the new python language server (pylance) and remote development being improved quite a bot. We use it daily in production in my company with almost zero problems so far.
At the lightning pace of only 12 years since the release of python 3 (and 15 months since python 2's deprecation), some linux distribution have ALREADY switched to it.