Alternate title: Pip 7 is Awesome, Here’s Why
A typical Python deployment looks like this:
- Pave the server, setting up a virtualenv and installing any pre-requisites necessary to build/install the Python requirements (compiler, development headers, etc.).
- To update to a new release:
- Update your source code
- Install its dependencies into the virtualenv via something like
pip install -r requirements.txt
This approach works, but is lacking in a few ways:
- Deployments are dependent on the network and availability of PyPI.
- “Clean” installs are prohibitively slow to do on every deploy. Because of that:
- You can’t easily/quickly rollback to a previous release.
- The virtualenv will accrue cruft over time as dependencies are added/removed.
Docker solves a number of these problems, but for many reasons I’m not sold on using it in production (yet). The good news is that today’s release of Python’s package installer, pip (version 7), will help you solve all these issues without Docker. It uses Python’s wheel format to cache binary builds of the dependencies.
Wheels are extremely fast, particularly for packages that require compilation (Pillow, psycopg2, lxml, etc.) “How fast?” you may ask… Well, let’s look at a few examples using our fork of the Wagtail demo project on a 2GB Digital Ocean VPS (all commands were run with a warm pip download cache).
Clean Install with Pip 6.1.1
First we’ll do a clean install of the project using the previous version of pip (6.1.1).
This takes about 3.25 minutes, approximately 195 seconds. Every build with pip 6 will take roughly the same time.
Clean Install with Pip 7
Now we’ll do the same build, but using pip 7 which caches the builds in wheel format.
This typically runs at about the same speed as pip 6 (+/- 5s), approximately 200 seconds.
Rebuild with Pip 7
Now that we have cached wheels, let’s see how long it takes to install the same dependencies into a clean virtualenv using pip 7.
This runs in about 11 seconds. An order of magnitude faster than the other tests.
Note: deploys that add new dependencies may take longer while the wheel cache is created for those packages.
This speed improvement unlocks a number of interesting possibilities for Python deployments that were previously too slow to consider.
It’s now feasible to build a new virtualenv on every deploy. The virtualenv can be considered immutable. That is, once it is created, it will never be modified. No more concerns about legacy cruft causing issues with the build.
This also opens the door to saving previous builds for quick rollbacks in the event of a bad deploy. Rolling back could be as simple as moving a symlink and reloading the Python services.
Another possibility is building your wheels in a central location prior to deployment. As long as your build server (or container) matches the OS and architecture of the application servers, you can build the wheels once and distribute them as a tarball (see Armin Ronacher’s platter project) or using your own PyPI server. In this scenario, you are guaranteed the packages are an exact match across all your servers. You can also avoid installing build tools and development headers on all your servers because the wheels are pre-compiled.
We’re excited about the opportunities wheels provide without adding additional layers of software on our servers. It’s inspiring to see the massive improvements being made to Python’s packaging system over the last few years while still maintaining backwards compatibility with legacy packages. I’m looking forward to seeing what comes next.