elasticsearch, redis, rabbitmq … so many choices

The other day I had a conversation with a developer here at Catalyst. We are using Elasticsearch, Redis, and RabbitMQ for various things and he was wondering “which one should I choose?”. These tools offer different features and play to different strengths and it’s not always very obvious when to use each one. After I responded to my colleagues email, I thought it might be worth writing up here.

To begin, both Elasticsearch and Redis are tools that loosely are lumped together as NoSQL systems. RabbitMQ on the other hand is a queuing system. The key uses for each are:

  • Elasticsearch is great for storing “documents”, which might just be logs. It offers a powerful search API to find things
  • Redis is a key/value cache or store. It’s very good at storing things that feel very much like data structures you’d find in programming languages. It is very much focused on speed and performance
  • RabbitMQ allows you to queue things and process items in a very efficient manner. It offers a nice and consistent abstraction to other ways of working through things such as looping over log content, traversing the file system or creating another system in a traditional RDBMS

Next I’ll offer some observations of these tools and possible use cases.

Elasticsearch

As I mentioned, Elasticsearch is a document store with excellent searchability. A very common use is for aggregating logs via logstash. In that case you can think of each log event or line as a “document”. While that is a very common use of Elasticsearch these days, the use can go much further. Many companies are using it to index various content structures to make them searchable. For example we are using it to search for content we extract from files.

Elasticsearch stores it’s content as JSON. This makes it possible to leverage the structure. For example fields can be stored, searched and retrieved. This feels a little like a select column from table; statement, thought he comparison looses value quickly.

In general I think of it as a place to persist data for the long term. Elasticsearch also makes operational tasks pretty easy, which includes replication and that reinforces the persistence impression.

If you need full text search and/or want to store things for a long time, Elasticsearch is a good choice.

Redis

I think of Redis as being very much focused on speed. It’s a place to store data that is needed in a lot of places and needed fast. For example storing session data, which will be useful by every service behind a load balancer is a good example.

Another example might be to aggregate and update performance data quickly. The Sensu monitoring framework does just that.

In general Redis is a great choice where you need to get specific values or datasets as you would with variables in a programming language. While there are persistence options, I tend to think of Redis primarily as a tool to speed things up.

In a nutshell, I would use Redis for fast access to specific data in a cache sort of way.

RabbitMQ

RabbitMQ is a queuing service where you put things to be handed to other systems. It allows different systems to communicate with each other without having to build that communication layer.

In our case we frequently need to do things with files. So a message is place in the queue pointing to that file. Another system then subscribes to the queue and when a file shows up, it takes the appropriate action. This could also be a log event or anything else that would warrant an action to be taken somewhere else.

While I’m generally a big fan of RESTful architectures, I’m willing to compromise when it comes to queuing. With a proper RabbitMQ client we get nice things such as the assignment of an item in the queue to a specific client and if the client fails, RabbitMQ will make that item available to another client. This avoids having to code this logic into the clients. Even in cases where a log is parsed and triggers events to the queuing system there is less work to deal with a failure since mostly there is no re-playing of events that have to happen.

RabbitMQ is great if you have a workflow that is distributed.

General thoughts

The great thing about these tools is that they abstract common things we need. That avoids having to build them into different parts of the stack over and over (we have built many versions of queuing, for example). These tools are also intended to scale horizontally, which allows for growth as utilization increases. With many homegrown tools there will always be a limit of the biggest box you can buy. On the flip side it’s also possible to run in a VM or a container to minimize the foot print and isolate the service.

From an operations perspective I also like the fact that all three are easy to set up and maintain. I don’t mean to say that running any service is a trivial task, but having performed installs of Oracle I certainly appreciate the much more streamlined management of these tools. The defaults that come out of the box with Elasticsearch, Redis, and RabbitMQ are solid, but there are many adjustments that can be made to meet the specific use case.

That brings me back to “which one should I use?”

Really, it depends on the use case. It’s likely possible to bend each system for most use cases. In the end I hope that some of these musings will help make the choices that make the most sense.

Cheers,

\@matthias