Building System Packages from Python Modules (with Dependencies Included)

by Sören König - 7 Dec 2015

In our last post on packaging, my colleague Felix Mueller talked about why it’s good to manage all your software with your system’s native package management tools. He also discussed how to build packages in an automated, consistent way. Now I’d like to describe the benefits of wrapping Python’s virtualenvs in system packages.

Why is it particularly useful to package and ship Python modules as native system packages? For the same reasons that apply to all other distributed software: to achieve reliable, atomic, reproducible and predictable deployments.

Python is Zalando Tech’s bread-and-butter language for writing tools and scripts for provisioning, managing and maintaining servers. But deploying Python software can cause a lot of headaches — for example, when two packages require different versions of the same dependency, or when you need to recompile C extensions on every single server (hello, PyCrypto!). The latter case requires the whole build-essential tool chain plus Python development libs — in other words, a lot of overhead. We want to keep our production servers' setup lean.

Wrapping our Python tools in native system packages wouldn't entirely solve the problem. We’d have to port all dependencies (and their dependencies) to Debian- or RedHat-land. This is where Python's virtualenv come into play. The idea is to combine the best of both worlds: self-contained and dependency-less virtualenvs, and the manageability of native system packages.
Admittedly, this idea is not that new: Berlin-based engineer Hynek Schlawack has scripted his own solution, and Spotify have made efforts in this direction with their dh-virtualenv extension to debhelper. But it works.

How We Do It at Zalando

Package-build, our open-source setup, combines the package builders fpm and fpm-cookery with our own script, cook-recipe.sh. Luckily, fpm now includes support for both virtualenvs and fpm-cookery, the latter thanks to contributions from Zalando engineers. The previous version of package-build used Vagrant as the base for the package build environment. Our current version replaces the heavier, VirtualBox-backed Vagrant with lightweight Docker containers — excellent for providing short-lived packaging environments.

Our cook-recipe.sh script runs inside Docker containers created ad hoc, and takes one or more recipe folder names as parameters. If no parameters are given, it runs all recipes in all subfolders in recipes/. Within those subfolders, the script looks for ./prepare.sh and runs it. The Git checkout and following sed command set the revision attribute of the respective recipe to a current timestamp. This establishes a datetime for each package build — creating a kind of "micro-iteration" for packages. We then call fpm-cook package with --no-deps; this is to define all build dependencies in the related Docker files, a necessary step for provisioning the images for each distribution.

Now all the essential tools for provisioning are already available, and we don’t have to install them during every new build — saving us time. This script can be run standalone — for example:

docker run -v ${PWD}:/data package_build/centos6 /data/cook-recipe.sh zalando-zcloud-virtualenv

Of course, you must publish the created packages in some of your repositories (we use aptly for managing our .deb repos, and createrepo for the .rpm part). For convenience, we use a Fabfile to pull everything together. There are tasks for initializing the repos as well as creating, inspecting, adding and removing packages.

An Example of Our Virtualenv Recipe Usage

Comparing some code snippets from the old and new versions of package-build will show you how much cleaner the new version looks. Here are the relevant parts of its recipe.rb file:

class ZalandoZcloud < FPM::Cookery::Recipe
description "Package containing CLI, agent and additional scripts for installing nodes via zCloud"
name "zalando-zcloud"
version "0.2.8"
source "https://stash.zalando.net/scm/pymodules/zalando-zcloud.git", :with => :git, :tag => "#{version}"
build_depends "python-setuptools"

platforms [:ubuntu, :debian] do
depends "zalando-cmdb-client", "python-paramiko >= 1.7.0"
end

platforms [:centos] do
depends "zalando-cmdb-client", "python-paramiko >= 1.7.0"
end

def build
safesystem 'python setup.py build'
end

def install
safesystem 'python setup.py install --root=../../tmp-dest --no-compile'
end
end

The ZalandoZcloud class was derived from FPM::Cookery::Recipe. When we created this recipe, unsetting the python_fix_name attribute wasn’t possible, and we needed a workaround. As you can see, it is an ugly hack with remaining dependencies on zalando-cmdb-client, which also had to be packaged for each distribution.

Now compare this to our new zalando-zcloud-virtualenv variant:

class ZalandoZcloud < FPM::Cookery::VirtualenvRecipe
description "Package containing CLI, agent and additional scripts for installing nodes via zCloud"

name "zalando-zcloud"
version "0.2.8"

build_depends "python-setuptools"
virtualenv_fix_name false
virtualenv_install_location "/opt/"
end

This recipe class is derived from FPM::Cookery::VirtualenvRecipe, and has no runtime dependencies. All Python modules on which our package depends on, are already defined in its setup.py and will pulled into the virtualenv. A lot cleaner!

Some Final Thoughts

With a few simple scripts, we can build isolated, self-contained packages from our own software; provide them in our internal repos; and not worry about deployment and dependencies. We can even use these scripts to package tarballs that are randomly dropped into a web folder. Because a simple shell script performs the actual package-building, we can easily use the same commands in a continuous integration context — i.e., to automatically build packages every time a recipe changes or a new one has been added.

Similar blog posts