why you should embrace a rabbitmq client library

Recently I had to re-run a lot of documents into one of our applications. The app lives on AWS and ingesting content involves the use of a RabbitMQ queue.

I’ve often used the the amqp-tools/rabbitmq-c for quick ad-hoc work in the past and so I wrote a very terse bash script to feed the list of documents to the queue. That script worked just fine, but I was in a hurry and I added quite a few queue clients to get the work done more quickly.

I stalled out in terms of rate and when I looked a bit more closely I found that my bash script wasn’t able to keep the queue fed sufficiently and my clients were going idle.

I also have some Ruby code using the bunny library and decided to re-write my feed script using that.

The results were startling.

Pushing 100,000 messages to the queue using the bash approach took about 28 minutes.

The Ruby version using a RabbitMQ library with persistent connection did the same work 35 seconds!

During a later run I pushed 1 million messages to RabbitMQ from a single client using the Ruby code.  That run took 6.5 minutes for an effective rate of 2500 messages per second.  The server is running on a r3.large and with that push and all the clients reading from it the load pushed up to only around 1.5. That is also a stark contrast to the bash version of the script during which I would see the load rise to 4+.

I didn’t take the time to dig deeply if this was due to process spawning in the bash script or overhead in connection setup/teardown with RabbitMQ. Given the load impact on the RabbitMQ server of the bash script (which ran on a different system) I’m confident that it’s not process spawning, but instead a lot of extra burden on RabbitMQ to deal with all those connection requests.

In the end it just speaks to the practicality of using client library the right way if things are going too slow when interacting with RabbitMQ.



elasticsearch, redis, rabbitmq … so many choices

The other day I had a conversation with a developer here at Catalyst. We are using Elasticsearch, Redis, and RabbitMQ for various things and he was wondering “which one should I choose?”. These tools offer different features and play to different strengths and it’s not always very obvious when to use each one. After I responded to my colleagues email, I thought it might be worth writing up here.

To begin, both Elasticsearch and Redis are tools that loosely are lumped together as NoSQL systems. RabbitMQ on the other hand is a queuing system. The key uses for each are:

  • Elasticsearch is great for storing “documents”, which might just be logs. It offers a powerful search API to find things
  • Redis is a key/value cache or store. It’s very good at storing things that feel very much like data structures you’d find in programming languages. It is very much focused on speed and performance
  • RabbitMQ allows you to queue things and process items in a very efficient manner. It offers a nice and consistent abstraction to other ways of working through things such as looping over log content, traversing the file system or creating another system in a traditional RDBMS

Next I’ll offer some observations of these tools and possible use cases.


As I mentioned, Elasticsearch is a document store with excellent searchability. A very common use is for aggregating logs via logstash. In that case you can think of each log event or line as a “document”. While that is a very common use of Elasticsearch these days, the use can go much further. Many companies are using it to index various content structures to make them searchable. For example we are using it to search for content we extract from files.

Elasticsearch stores it’s content as JSON. This makes it possible to leverage the structure. For example fields can be stored, searched and retrieved. This feels a little like a select column from table; statement, thought he comparison looses value quickly.

In general I think of it as a place to persist data for the long term. Elasticsearch also makes operational tasks pretty easy, which includes replication and that reinforces the persistence impression.

If you need full text search and/or want to store things for a long time, Elasticsearch is a good choice.


I think of Redis as being very much focused on speed. It’s a place to store data that is needed in a lot of places and needed fast. For example storing session data, which will be useful by every service behind a load balancer is a good example.

Another example might be to aggregate and update performance data quickly. The Sensu monitoring framework does just that.

In general Redis is a great choice where you need to get specific values or datasets as you would with variables in a programming language. While there are persistence options, I tend to think of Redis primarily as a tool to speed things up.

In a nutshell, I would use Redis for fast access to specific data in a cache sort of way.


RabbitMQ is a queuing service where you put things to be handed to other systems. It allows different systems to communicate with each other without having to build that communication layer.

In our case we frequently need to do things with files. So a message is place in the queue pointing to that file. Another system then subscribes to the queue and when a file shows up, it takes the appropriate action. This could also be a log event or anything else that would warrant an action to be taken somewhere else.

While I’m generally a big fan of RESTful architectures, I’m willing to compromise when it comes to queuing. With a proper RabbitMQ client we get nice things such as the assignment of an item in the queue to a specific client and if the client fails, RabbitMQ will make that item available to another client. This avoids having to code this logic into the clients. Even in cases where a log is parsed and triggers events to the queuing system there is less work to deal with a failure since mostly there is no re-playing of events that have to happen.

RabbitMQ is great if you have a workflow that is distributed.

General thoughts

The great thing about these tools is that they abstract common things we need. That avoids having to build them into different parts of the stack over and over (we have built many versions of queuing, for example). These tools are also intended to scale horizontally, which allows for growth as utilization increases. With many homegrown tools there will always be a limit of the biggest box you can buy. On the flip side it’s also possible to run in a VM or a container to minimize the foot print and isolate the service.

From an operations perspective I also like the fact that all three are easy to set up and maintain. I don’t mean to say that running any service is a trivial task, but having performed installs of Oracle I certainly appreciate the much more streamlined management of these tools. The defaults that come out of the box with Elasticsearch, Redis, and RabbitMQ are solid, but there are many adjustments that can be made to meet the specific use case.

That brings me back to “which one should I use?”

Really, it depends on the use case. It’s likely possible to bend each system for most use cases. In the end I hope that some of these musings will help make the choices that make the most sense.