Zebradil | Pipenv + AWS Lambda

TL;DR

1
$ pipenv run pip install -r <(pipenv lock -r) --target _build/

Took from here.

Long story

Virtual environment

In present, I’m dealing a lot with AWS components and with AWS Lambda particularly. AWS Lambda supports several languages. There are five of them at the moment: Node.js, Java, Python, C# and Go. In my team we’re using Python.

As far as there are still two in-use versions of Python and each project requires its own set of libraries, a standard way for managing these dependencies is using the so-called virtual environment.

Traditionally, virtual environment was being created using the following approach:

1
2
$ python3 -m venv _env
$ source _env/bin/activate

The first command creates a clean distribution with Python3 in local directory _env. The second one activates virtual environment: changes environment variable PATH, adds VIRTUAL_ENV, and removes PYTHONHOME.

After that, commands pip, pip3, python and python3 works in context of distribution in directory _env.

There is a deactivate command to deactivate virtual environment and to return environment variables in their initial state.

The next step is installing libraries. If a project is distributed in the form of a library, all required dependencies are specified in setup.py file. For projects which are not libraries, dependencies are specified in special plain-text files. They can be named in any way, but usually it’s requirements.txt for general dependencies and requirements-dev.txt for development dependencies.

1
2
# Install dependencies from a file
$ pip install -r requirements.txt

Deploy AWS Lambda

To deploy the lambda source code, it must be packed to zip-archive within all dependencies. It’s pretty simple to do if everything is located in the same directory. We used the following commands for this:

1
2
3
4
5
6
# Inside project directory
$ python3 -m venv _env
$ source _env/bin/activate
$ mkdir -p _build
$ pip install -r requirements.txt -t _build
$ cp -R src/* _build

After that it’s easy to pack the content of _build directory and upload it to AWS S3. The other way is to use AWS Serverless Application Model (SAM) extension, which automatizes and simplifies the process. It’s actually what we’re using, but it’s another topic.

So, what’s wrong?

The approach described above leads to one serious pitfall and to some minor ones.

For specifying dependencies there is a special standard PEP 440, which allows to use conditional dependency versions, instead of fixing it to particular one. Usually, a version which provides backward-compatibility is used.

For example:

1
2
torchtext>=0.2.1
bs4PyMySQL>=0.7,<1.0

The problem is that in this case the latest available version is installed. So, for equal conditions, two different in time invocations pip install can lead to different results.

As for lambda, versions of libraries which are installed locally and versions of libraries which are packed into a zip can differ. Because these two installations happens in different moments of time. It can be a reason of hidden bugs and compatibility errors on cloud-side.

To solve this problem, dependency managers for some other languages uses the so-called lock-files. For example, Composer for PHP uses composer.lock, and NPM for Node.js uses package-lock.json. These files are generated and updated automatically during the installation of new dependencies or during the update of existing ones. Lock-files contains information about specific versions of dependencies. It allows the developer to be sure that installing dependencies from a particular lock-file produces the same result in different moments of time. Usually, the information for checking consistency is stored along with the versions. It helps to avoid installing spoofed version of libraries.

A minor issue of the traditional approach is that each virtual environment have to be created and activated manually. Moreover, it’s not clear which version of Python the developer should use.

Another minor issue is the manually-managed dependencies in requirements.txt files. There was a situation when a developer installed the dependency locally, but forgot to add it in the requirements file, which caused failing builds on other systems.

Pipenv

The pipenv project is aimed to get rid of these problems from the developer. It uses Pipfile as the dependency configuration file and Pipfile.lock as the lock-file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create virtual environment
# (it creates a directory with distribution of Python3 and Pipfile)
$ pipenv --three

# Install dependencies
# (adds dependencies to [[package]] section of Pipfile and generates/updates Pipfile.lock)
$ pipenv install 'pymysql>=0.7,<1' boto3

# Install development dependencies
# (adds dependencies to [[packages-dev]] section)
$ pipenv install -d rope

# Update dependencies to latest available versions
$ pipenv update

If Pipfile already exists in a project, setting up a virtual environment and installing dependencies can be done with a single command:

1
$ pipenv install

If you want to set up a development environment, just add -d flag:

1
$ pipenv install -d

If there is Pipenv.lock, exact versions of dependencies are installed.

Pipenv + AWS Lambda

The pipenv tool is relatively new and is still under active development. It doesn’t provide some features which are provided by pip. Some existing features could be removed in the future (for example).

For building a package for AWS Lambda it’s critical to have an ability to install dependencies to particular directory. Pipenv doesn’t provide this feature out-of-the-box. But, as usual, there are workarounds. One of them is to write small script which copies sources of dependencies from virtual environment directory to target directory. Another option is to generate requirements.txt and install dependencies with pip.

The second option can be achieved with one-liner (there are two commands, though):

1
$ pipenv run pip install -r <(pipenv lock -r) --target _build/

pipenv lock -r generates a list of dependencies, which is compatible with the syntax of requirements.txt
pip install --target _build -r file installs dependencies from a specified file to a target directory
pipenv run is required to run pip inside virtual environment

Step-by-step

Prerequisites: installed homebrew, pipenv and python3 (recommended, but also should work with python2).

1
2
3
4
5
6
# Install homebrew
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# Install python3
$ brew install python3
# Install pipenv
$ pip install pipenv

Configuring virtual environment (in project root):

1
2
3
4
5
6
# Create virtual environment
$ pipenv --three
# Install required project dependencies
$ pipenv install <required dependencies>
# Install dependencies for development
$ pipenv install -d <required development dependencies>

Build and deploy package for AWS Lambda:

1
2
3
4
5
6
7
# Install dependencies to a dedicated directory
$ pipenv run pip install -r <(pipenv lock -r) --target _build/
# Copy project's source code to the building directory
$ cp -R src/* _build
# Zip the code inside the directory and upload it to an S3 bucket
# We're using "aws cloudformation package & deploy" approach for that
$ <run aws cloudformation package,deploy commands as usual>

The commands above could be placed to Makefile to be run with one command.