Primary image for Dissecting a Python Zipapp Built with Shiv

Dissecting a Python Zipapp Built with Shiv

In a previous post, we showed how to use shiv to bundle a Django project into a single file for distribution and deployment. Running a large Python project as a single file feels like magic – which is great until you need to debug a problem. At that point, you need to understand how things work and what is happening under the hood. With that in mind, let’s demystify shiv’s magic.

Zipapps

Shiv uses a little known feature of Python called a ZIP application or “zipapp”. Zipapps provide a way to bundle multiple Python files into a single ZIP archive which the Python interpreter can then execute. Here’s a trivial example:

$ mkdir myapp
$ echo 'print("hello world")' > myapp/__main__.py
$ python -m zipapp myapp
$ python myapp.pyz
hello world

Pretty cool, huh? Copy myapp.pyz anywhere you have a compatible version of Python and it just works.

But real-world projects aren’t so simple. They have dependencies, non-Python files that need to be included, and complex build steps.

Enter Shiv

The folks at LinkedIn built shiv to work around the shortcomings of traditional zipapps. It includes dependencies in the zipapp and ensures they are on the PYTHONPATH at runtime. Shiv is only needed for creation. The resulting zipapp is executable without any additional tooling. You can think shiv as a wrapper around pip which it uses behind the scenes to download and install dependencies. Here is an example of creating a zipapp for awscli:

$ shiv --output-file=aws.pyz --entry-point=awscli.clidriver.main awscli

This downloads awscli and all its dependencies from PyPI and creates a zipapp (aws.pyz) which will run a Python function specified by --entry-point on execution. That file can be copied to any system with a compatible Python version and executed as if it were a binary executable (./aws.pyz or python aws.pyz).

Under the Hood

The good news is that shiv is surprisingly simple. When we build a new archive, shiv does the following:

  1. Use pip to download all the dependencies to a temporary directory
  2. Write some metadata used during the bootstrap process to environment.json
  3. Create a bootstrap script that is used when the zipapp is executed.
  4. Bundle those files up into zip archive
  5. Insert a shebang at the top of the zip archive so it can be used as an executable.

The bootstrap script’s responsibility is to:

  1. Unpack the dependencies to a unique path. This is skipped if it already exists from a previous run.
  2. Insert that path into the PYTHONPATH.
  3. Execute the --entry-point we defined during creation.

The unique path is determined by combining the filename of the zipapp and a UUID generated at build time (stored in environment.json). It is stored in ~/.shiv by default but can be changed by setting the SHIV_ROOT environment variable. After execution, we can see the following:

cd ~/.shiv && find . -maxdepth 2
.
./aws_3e32a16c-6652-44cc-a561-3784814d736e
./aws_3e32a16c-6652-44cc-a561-3784814d736e/site-packages

The site-packages directory looks just like the site-packages directory you’d find for a standard Python installation or a virtualenv. As we can see, this was assigned a UUID of 3e32a16c-6652-44cc-a561-3784814d736e at build time which can be confirmed by inspecting the included metadata:

$ unzip -p aws.pyz environment.json | jq .build_id
"3e32a16c-6652-44cc-a561-3784814d736e"

This directory is added to the PYTHONPATH like so:

$ echo "import sys, pprint; pprint.pprint(sys.path)" | \
  SHIV_INTERPRETER=1 ./aws.pyz
Python 3.7.3 (default, Jun 11 2019, 01:05:09)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> ['./aws.pyz',
 '/usr/local/lib/python37.zip',
 '/usr/local/lib/python3.7',
 '/usr/local/lib/python3.7/lib-dynload',
 '/home/user/.shiv/aws_3e32a16c-6652-44cc-a561-3784814d736e/site-packages',
 '/usr/local/lib/python3.7/site-packages']

Note: The environment variable SHIV_INTERPRETER allows us to drop down into a Python shell using the zipapp’s environment.

The fifth item, /home/user/.shiv/... is the one that was injected in via the bootstrap script. The others are Python defaults.

If you prefer to run in an isolated environment without the global site packages, Python’s -S flag can be used:

$ echo "import sys, pprint; pprint.pprint(sys.path)" | \
  SHIV_INTERPRETER=1 python -S aws.pyz
Python 3.7.3 (default, Jun 11 2019, 01:05:09)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> ['aws.pyz',
 '/usr/local/lib/python37.zip',
 '/usr/local/lib/python3.7',
 '/usr/local/lib/python3.7/lib-dynload',
 '/home/user/.shiv/aws_3e32a16c-6652-44cc-a561-3784814d736e/site-packages']

Once unpacked, you can inspect the files on disk and even edit them if you’re trying to do some tricky debugging. Shiv will not overwrite the files of a previously unpacked zipapp unless the environment variable SHIV_FORCE_EXTRACT is set.

That’s It

Turns out shiv is pretty straightforward. One of the things I like most about it is its simplicity. When there’s a problem with my code, it’s easy to poke around and rule out shiv as a cause. The code is well commented and easy to follow (here’s the bootstrap script). It is also stable and appears to be feature complete which means not having to work against a moving target.

Photo by SHVETS production from Pexels: https://www.pexels.com/photo/stack-of-empty-cardboard-boxes-prepared-for-relocation-from-home-7203699/

Peter Baumgartner

About the author

Peter Baumgartner

Peter is the founder of Lincoln Loop, having built it up from a small freelance operation in 2007 to what it is today. He is constantly learning and is well-versed in many technical disciplines including devops, …