Beware cURL Plus Files

Sometimes a queue worker needs to be no more glamorous than a shell script. If your queues are HTTP and so are the other services, it’s easy to reach for the shell and the venerable cURL. cURL is the UNIX operators default user agent. if it doesn’t work in cURL, there’s a good chance it won’t work in other situations.

We have a queue worker that interacts with several web services. It follows this rough outline…

1. Check for work
2. Get configs
3. Coordinate, communicate
4. Do work

Repeat forever which could be “a mighty long time” as Prince once told us.

The last step is the most interesting, but a little more background…

It is not hard to envision some generic functions in the shell. A generic logger function handles logging. Here are some near real world snippets of code.

1. RSTOPWATCHBEGIN=$(date +”%s.%N”)
2. curl -ski -H “x-auth-user: ${RUSER}” -H “x-auth-expiry: ${AEXPIRY}” “${THIS_RURL}${BUCKETNAME}/${USAFE}” -XPUT –upload-file “${SOURCE}” > $RRESULTS 2>$SPECIAL_ERR_LOG
3. RSTOPWATCHEND=$(date +”%s.%N”)

You can see from this example that the time to interact with this service is the difference between RSTOPWATCHEND (line 3) and RSTOPWATCHBEGIN (line 1). Because these are more granular than “second” you will need to do floating point math commonly in awk or bc (or hope your shell supports it, most do not). Passing it to the logger function records it for evaluation later.

cURL is a rock star. In this worker script, when doing the work of getting configs and communicating over HTTP, routinely, the work completes in hundredths of seconds. The way the script is set up, that includes the time to invoke cURL.

Here is the output of some of those results…

5320103 GETs
0.016 seconds per transaction

But when that interaction involves grabbing a file that is available locally through an NFS mount, the results go south quickly.

Here are those results…

961375 total seconds
507016 PUTs
1.896 seconds per transaction

What can it be? Clearly, it should not be cURL, too many other services are being interacted with over HTTP with expected results. It must be the web service. It is just slower and more complicated than the other HTTP services.

Here is where the story could have ended.

For a number of reasons, we had other code running against this same service. One was using a mojolicious library. The average interaction time with the same service doing the same work was 0.5 seconds. That is not insignificant when you do an operation 3 million times a day. But this worker was PUTing files already in memory. So it is not quite the same.

A different worker was built using Python and the Request library for HTTP. This code had a much smaller transaction time with the web service too.

Here are those results…

21180 total seconds
127479 PUTs
0.166 seconds per transaction

The timing calls are isolated to the same transaction. The files are still fetched over NFS. The service is still authenticated. The service is still using SSL. Finally, the most important thing is that the Python code was running on the same machines as the scripts using cURL. We can comfortably use the phrase, “holding all other variables equal…”

What can account for the 1.6 second difference?

Now it is hard to ignore cURL. We suspect that there is more overhead than we anticipate for cURL to spawn a child process and pull that file into the PUT. Other influencers may include slower authentication responses or less efficient SSL libraries.

If you love and use cURL, you may want to dig into the logs and check,your performance. It might me worth using a different tool for the heavy lifting.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s