Recently I had to re-run a lot of documents into one of our applications. The app lives on AWS and ingesting content involves the use of a RabbitMQ queue.
I’ve often used the the amqp-tools/rabbitmq-c for quick ad-hoc work in the past and so I wrote a very terse bash script to feed the list of documents to the queue. That script worked just fine, but I was in a hurry and I added quite a few queue clients to get the work done more quickly.
I stalled out in terms of rate and when I looked a bit more closely I found that my bash script wasn’t able to keep the queue fed sufficiently and my clients were going idle.
I also have some Ruby code using the bunny library and decided to re-write my feed script using that.
The results were startling.
Pushing 100,000 messages to the queue using the bash approach took about 28 minutes.
The Ruby version using a RabbitMQ library with persistent connection did the same work 35 seconds!
During a later run I pushed 1 million messages to RabbitMQ from a single client using the Ruby code. That run took 6.5 minutes for an effective rate of 2500 messages per second. The server is running on a r3.large and with that push and all the clients reading from it the load pushed up to only around 1.5. That is also a stark contrast to the bash version of the script during which I would see the load rise to 4+.
I didn’t take the time to dig deeply if this was due to process spawning in the bash script or overhead in connection setup/teardown with RabbitMQ. Given the load impact on the RabbitMQ server of the bash script (which ran on a different system) I’m confident that it’s not process spawning, but instead a lot of extra burden on RabbitMQ to deal with all those connection requests.
In the end it just speaks to the practicality of using client library the right way if things are going too slow when interacting with RabbitMQ.