Beware cURL Plus Files

Sometimes a queue worker needs to be no more glamorous than a shell script. If your queues are HTTP and so are the other services, it’s easy to reach for the shell and the venerable cURL. cURL is the UNIX operators default user agent. if it doesn’t work in cURL, there’s a good chance it won’t work in other situations.

We have a queue worker that interacts with several web services. It follows this rough outline…

1. Check for work
2. Get configs
3. Coordinate, communicate
4. Do work

Repeat forever which could be “a mighty long time” as Prince once told us.

The last step is the most interesting, but a little more background…

It is not hard to envision some generic functions in the shell. A generic logger function handles logging. Here are some near real world snippets of code.

1. RSTOPWATCHBEGIN=$(date +”%s.%N”)
2. curl -ski -H “x-auth-user: ${RUSER}” -H “x-auth-expiry: ${AEXPIRY}” “${THIS_RURL}${BUCKETNAME}/${USAFE}” -XPUT –upload-file “${SOURCE}” > $RRESULTS 2>$SPECIAL_ERR_LOG
3. RSTOPWATCHEND=$(date +”%s.%N”)

You can see from this example that the time to interact with this service is the difference between RSTOPWATCHEND (line 3) and RSTOPWATCHBEGIN (line 1). Because these are more granular than “second” you will need to do floating point math commonly in awk or bc (or hope your shell supports it, most do not). Passing it to the logger function records it for evaluation later.

cURL is a rock star. In this worker script, when doing the work of getting configs and communicating over HTTP, routinely, the work completes in hundredths of seconds. The way the script is set up, that includes the time to invoke cURL.

Here is the output of some of those results…

5320103 GETs
0.016 seconds per transaction

But when that interaction involves grabbing a file that is available locally through an NFS mount, the results go south quickly.

Here are those results…

961375 total seconds
507016 PUTs
1.896 seconds per transaction

What can it be? Clearly, it should not be cURL, too many other services are being interacted with over HTTP with expected results. It must be the web service. It is just slower and more complicated than the other HTTP services.

Here is where the story could have ended.

For a number of reasons, we had other code running against this same service. One was using a mojolicious library. The average interaction time with the same service doing the same work was 0.5 seconds. That is not insignificant when you do an operation 3 million times a day. But this worker was PUTing files already in memory. So it is not quite the same.

A different worker was built using Python and the Request library for HTTP. This code had a much smaller transaction time with the web service too.

Here are those results…

21180 total seconds
127479 PUTs
0.166 seconds per transaction

The timing calls are isolated to the same transaction. The files are still fetched over NFS. The service is still authenticated. The service is still using SSL. Finally, the most important thing is that the Python code was running on the same machines as the scripts using cURL. We can comfortably use the phrase, “holding all other variables equal…”

What can account for the 1.6 second difference?

Now it is hard to ignore cURL. We suspect that there is more overhead than we anticipate for cURL to spawn a child process and pull that file into the PUT. Other influencers may include slower authentication responses or less efficient SSL libraries.

If you love and use cURL, you may want to dig into the logs and check,your performance. It might me worth using a different tool for the heavy lifting.

Advertisements

Deploying Python Applications and their Virtual Environments

Introduction

As noted in a past article, we leverage virtualenv and pip to isolate and manage some of our python applications. A natural next question is “How can a python virtual environment and related application be deployed to a production server?”. This article provides a conceptual overview of one way such deployments can be handled.

The Server Environment and Conventions

First, let’s discuss some assumptions about the server environment. In this article, a deployment server, development server(s), and production server(s) are all discussed. It can be assumed that all these servers are running the same operating system (in this case, RHEL 6). This provides a luxury which allows for transporting virtual environments from one host to another with no ill effects and no requirement to build new virtual environments for each host.

Additionally, there are some directory conventions used which help assure consistency from host to host. The virtual environment is located in a standard path such as /opt/companyname/. The code for each python application is then located in a directory inside the virtual environment root. This makes for a set of paths like so:

Example directories:

/opt/company/myapp/   # the virtual env root

/opt/company/myapp/myapp/              # the application root
/opt/company/myapp/myapp/lib/          # the application library
/opt/company/myapp/myapp/bin/appd.py   # the application

The Build

The building of the python application is a two step process. First the virtual environment is created or updated. Next, the desired version of the application is exported from the repository. This work all takes place on the deployment server.

Steps to build the virtual env and application:

# Go to the standard app location
cd /opt/company/

# Create the virtual env if needed
virtualenv ./myapp

# Export the desired copy of the app inside the virtual env root
svn export $repouri /opt/company/myapp/myapp/

# Activate the virtualenv
cd /opt/company/myapp/ && source ./bin/activate

# Install the requirements
cd /opt/company/myapp/myapp/
pip install -r ./requirements.txt

Here’s an example script which would handle such a build:

* build-myapp.sh

The Deploy

Once the virtualenv and application are built, the deployment can be handled with some rsync and scripting work. This same model can be used to deploy to development servers or production servers, maintaining consistency across any environment. It can also be used to deploy your application to a new server. While a bit of a simplification, the deployment can be envisioned as a simple for-loop around rsync.

Example deployment loop:

for host in $devservers; do
    rsync -avz --delete-after /opt/company/myapp $host:/opt/company/myapp
done

Here’s an example script which would handle such a build:

* deploy-myapp.sh

Closing

This describes one of many ways python applications and their virtual environments can be deployed to remote hosts. It is a fairly simple matter to assemble these techniques into shell scripts for semi-automated build and deployment. Such scripts can then be enhanced preferred conventions as well as the more intelligent handling of application restarts, rollbacks, configuration management, and other desired improvements particular to the application.

Leveraging Python Virtual Environments and pip

Introduction

Python virtual environments are a common means by which python applications can be isolated from one another and also segregated from the global directories and packages of an operating system. They can provde a pragmatic compromise between the flexibility needs of software development and the stability standards for server management. While new techniques like containers and Docker may point to the future of application deployment, virtual environments remain a solid choice for local Python development as well as application management across a set of Linux servers.

Installing virtualenv and pip

To take full advantage of python virtual environments, both virtualenv and pip should be installed. The virtualenv package provides environment management and isolation while pip provides python package installation and management within the virtual environment. These tools are not always available by default though. In order to bootstrap, it is often easiest to install virtualenv which contains pip.

Installation of virtualenv varies across different operating systems. Here, I will focus on Redhat Enterprise Linux / CentOS because of their common usage for our server installations.

For both RHEL 5 and RHEL 6, the EPEL repository is a great resource for obtaining virtualenv and related python packages. Installing this repository can be as simple as installing a single rpm manually, or having it added to the server configuration using your preferred management system such as puppet or chef. Details of this installation are beyond the scope of this article. Readers may also wish to instead investigate and use the newer RedHat Software Collections.

Once the EPEL repository is installed, installing virtualenv (and pip) is just a simple rpm installation. For RHEL 6, python 2.6 is the default python installation. For RHEL 5, you must install python2.6 from EPEL in addition to the virtualenv package. Note that this means paying close attention to 2.4 vs 2.6 usage on RHEL 5 systems.

Installing virtualenv (and pip) on RHEL 6:

# Requires EPEL repo
[root@host]$ yum install python-virtualenv

Installing virtualenv (and pip) on RHEL 5:

# Requires EPEL repo
# Requires python2.6 from EPEL
[root@host]$ yum install python26
[root@host]$ yum install python26-virtualenv

Creating and Activating a Virtual Environment

With the necessary pre-requisites, you can now create new virtual environments. The examples here illustrate usage on a Linux server. However, it is not necessary to have root privileges to uses virtual environments or pip. You can use them for local exploration and development, on automated testing hosts, and of course development and production servers.

Creating a new virtual environment:

# Choose a common location for your applications / virtual envs
[host]# cd /opt/company/

# Create a new virtual environment
[host]# virtualenv ./myapp/

Using the virtual environment requires that it be “activated”. This is a simple one step command.

Activate the virtual environment:

[root@host]# source ./myapp/bin/activate
(myapp)[host]#

Using a Virtual Environment Together with an Application

How you choose to use a virtual environment for an application can vary dramatically per situation and personal preferences. Once a virtual environment is activated, you can navigate the server filesystem and use any python code you wish. Conceptually, the virtualenv can be wholly separate from an application. In practice, we often tie a single application to a single virtualenv. Below are just two examples to illustrate differences.

A virtualenv and a local working copy of code:

# Here's a virtual env and activation
source ~/myvirts/myapp/bin/activate

# And here is a working copy of trunk from myapp
cd ~/mycode/svn/myapp-trunk/

An application inside a virtual env directory:

# A virtual env root and an application root inside
/opt/company/myapp/            # the virtual env root
/opt/company/myapp/myapp/               # the application root
/opt/company/myapp/myapp/lib/          # the application library
/opt/company/myapp/myapp/bin/appd.py   # the application

# So you can enter the virtual env and then run the application
cd /opt/company/myapp/
source ./bin/activate
cd ./myapp/
./bin/appd.py

While not a standard convention, placing the application inside the virtualenv directory like this allows for synchronizing a whole virtualenv and application from server to server, in a single isolated directory. You may prefer to decouple the directories for very valid reasons such as easier management in diverse server environment. As noted initially, conventions for joining a virtualenv and an app can vary widely.

Installing Packages with pip

The pip application allows you to install and manage applications. While it can be used globally, we focus here on usage within a virtual environment. Packages can be installed from several sources, with the most common being installations from the Python Package Index.

Example installation of a common package named “requests”:

# List the current packages
(myapp)[host]# pip freeze

# Install a new package from pypi
(myapp)[host]# pip install requests
Downloading/unpacking requests
Downloading requests-2.3.0.tar.gz (429kB): 429kB downloaded
Running setup.py egg_info for package requests
Installing collected packages: requests
Running setup.py install for requests
Successfully installed requests
Cleaning up...
(myapp)[host]#

For a given application, you’ll want to keep a list of all the required external packages. The pip application provides a convention for this. You can place all the package names (and versions if desired) inside a file named “requirements.txt”. This file can then be stored in the root directory of your application, and managed like any other file in your version control system.

Example requirements.txt file

(myapp)[host]$cat requirements.txt
requests
rawes
redis

Install requirements from file:

# Make sure the virtualenv is already activated!
(myapp)[host]$pip install -r ./requirements.txt

Installing Packages from a Repository

The last item we’ll cover is the installation of packages from a repository. For a variety of reasons, the packages or versions you need to install with pip may not be available in pypi. A handy workaround can be to install a package straight from a repository. This may require some special setup effort on behalf of the package owner, but those details are beyond the scope of this article. This technique can be especially handy for installing internal packages straight from your company repository.

There are many possible variations to this installation method, which depend on the type of repository and other attributes. I’ll show two common examples.

Requirements entry which will install the langid package from the GitHub master branch:

(myapp)[host]$cat requirements.txt
-e git+https://github.com/saffsd/langid.py.git#egg=langid

Requirements entry which will install a specific tagged version of an internal package (helper-lib) from a local svn repository:

(myapp)[host]$cat requirements.txt
(dre-ops)[rain dre-ops-trunk]$cat requirements.txt
-e svn+ssh://repo.mycompany.com/var/svn/repos/helper-lib/tags/REL-Stable#egg=helper-lib

Closing

The above examples illustrate some of the techniques used at Catalyst to isolate and manage python applications and their requirements. The flexibility of virtualenv and pip can allow you to use your own conventions for your own environment. Overall, we’ve found pip and virtualenv to be very useful for the management and deployment of python applications in our environment.