Beware cURL Plus Files

Sometimes a queue worker needs to be no more glamorous than a shell script. If your queues are HTTP and so are the other services, it’s easy to reach for the shell and the venerable cURL. cURL is the UNIX operators default user agent. if it doesn’t work in cURL, there’s a good chance it won’t work in other situations.

We have a queue worker that interacts with several web services. It follows this rough outline…

1. Check for work
2. Get configs
3. Coordinate, communicate
4. Do work

Repeat forever which could be “a mighty long time” as Prince once told us.

The last step is the most interesting, but a little more background…

It is not hard to envision some generic functions in the shell. A generic logger function handles logging. Here are some near real world snippets of code.

1. RSTOPWATCHBEGIN=$(date +”%s.%N”)
2. curl -ski -H “x-auth-user: ${RUSER}” -H “x-auth-expiry: ${AEXPIRY}” “${THIS_RURL}${BUCKETNAME}/${USAFE}” -XPUT –upload-file “${SOURCE}” > $RRESULTS 2>$SPECIAL_ERR_LOG
3. RSTOPWATCHEND=$(date +”%s.%N”)

You can see from this example that the time to interact with this service is the difference between RSTOPWATCHEND (line 3) and RSTOPWATCHBEGIN (line 1). Because these are more granular than “second” you will need to do floating point math commonly in awk or bc (or hope your shell supports it, most do not). Passing it to the logger function records it for evaluation later.

cURL is a rock star. In this worker script, when doing the work of getting configs and communicating over HTTP, routinely, the work completes in hundredths of seconds. The way the script is set up, that includes the time to invoke cURL.

Here is the output of some of those results…

5320103 GETs
0.016 seconds per transaction

But when that interaction involves grabbing a file that is available locally through an NFS mount, the results go south quickly.

Here are those results…

961375 total seconds
507016 PUTs
1.896 seconds per transaction

What can it be? Clearly, it should not be cURL, too many other services are being interacted with over HTTP with expected results. It must be the web service. It is just slower and more complicated than the other HTTP services.

Here is where the story could have ended.

For a number of reasons, we had other code running against this same service. One was using a mojolicious library. The average interaction time with the same service doing the same work was 0.5 seconds. That is not insignificant when you do an operation 3 million times a day. But this worker was PUTing files already in memory. So it is not quite the same.

A different worker was built using Python and the Request library for HTTP. This code had a much smaller transaction time with the web service too.

Here are those results…

21180 total seconds
127479 PUTs
0.166 seconds per transaction

The timing calls are isolated to the same transaction. The files are still fetched over NFS. The service is still authenticated. The service is still using SSL. Finally, the most important thing is that the Python code was running on the same machines as the scripts using cURL. We can comfortably use the phrase, “holding all other variables equal…”

What can account for the 1.6 second difference?

Now it is hard to ignore cURL. We suspect that there is more overhead than we anticipate for cURL to spawn a child process and pull that file into the PUT. Other influencers may include slower authentication responses or less efficient SSL libraries.

If you love and use cURL, you may want to dig into the logs and check,your performance. It might me worth using a different tool for the heavy lifting.

augmenting the elasticsearch docker container

We are running lots of things in docker. Elasticsearch is one of those things. It’s a very nice way to go especially since there is an official elasticsearch docker image available.

Since we are running in AWS we need the elasticsearch-cloud-aws plugin to allow for the nodes in the cluster to find each other.

To pull things together we are building a custom docker image based on the official one and simply install the needed plugin. This gives us everything we need to run.

However, to make it all happen there are some caveats.

The official image uses the /data directory for logs, data and plugins. The image also exposes /data as a VOLUME. This makes it possible to point the container at a location on the host to keep the heavy write operations for logging and, of course, the data itself out of the container. It also allows for upgrades etc, by simply pointing a container at the data location.

There is a downside to this. The image also places the plugins under /data/plugins and so when the container starts and sets the volume, the plugins “vanish”. It’s also worth noting that our custom Dockerfile, which extends the official one seemed to work just fine with this command:

RUN /elasticsearch/bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.4.1

There are no errors generated by this, however the plugin does NOT persist into /data/plugins! This seems a bit odd, but in the end the /data location would end up being replaced by the VOLUME regardless.

To work around this our custom Dockerfile creates /elasticsearch/plugins, modifies the config for elasticsearch and then installs the plugin:

FROM dockerfile/elasticsearch
MAINTAINER Matthias Johnson <>

# move the ES plugins away from the /data volume where it won't survive ...
RUN mkdir /elasticsearch/plugins
RUN sed -i 's@plugins:\s*/data/plugins@plugins: /elasticsearch/plugins@' /elasticsearch/config/elasticsearch.yml
# install the AWS plugin
RUN /elasticsearch/bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.4.1

# Expose ports.
#   - 9200: HTTP
#   - 9300: transport

# start the service
ENTRYPOINT [ "/elasticsearch/bin/elasticsearch" ]

Now, we can use the resulting image to spin up the container to run elasticsearch without having to perform the plugin install to the /data location before starting the container.

This approach should also work nicely for other plugins we may need in the future.