Reducing network overhead in Redis calls with Lua

For one of our applications, we cache a variety of data in Redis for use in later calculations. When profiling the application, I noticed that one of the caching steps was taking much longer that any of the others. Looking at the profiler output quickly showed why:

         3863231 function calls in 266.866 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50992  259.846    0.005  259.846    0.005 {method 'recv' of '_socket.socket' objects}
    50992    1.293    0.000    2.491    0.000 connection.py:593(pack_command)
    50994    0.802    0.000    0.802    0.000 {method 'sendall' of '_socket.socket' objects}
    50992    0.716    0.000  261.088    0.005 connection.py:315(read_response)
   836706    0.516    0.000    0.516    0.000 {isinstance}
    50992    0.450    0.000  266.530    0.005 client.py:558(execute_command)
   203968    0.392    0.000    0.717    0.000 connection.py:577(encode)
    50992    0.249    0.000  261.709    0.005 client.py:575(parse_response)
    50990    0.210    0.000    0.210    0.000 client.py:231(float_or_none)
    50992    0.199    0.000    0.199    0.000 {method 'feed' of 'hiredis.Reader' objects}
    50992    0.191    0.000    1.142    0.000 connection.py:529(send_packed_command)
    50992    0.171    0.000    3.804    0.000 connection.py:554(send_command)
    50992    0.164    0.000    0.311    0.000 connection.py:885(release)
   254960    0.163    0.000    0.163    0.000 {method 'join' of 'str' objects}
    50992    0.147    0.000    0.257    0.000 connection.py:868(get_connection)
   713889    0.142    0.000    0.142    0.000 {len}
    50992    0.138    0.000  261.250    0.005 connection.py:566(read_response)
   101984    0.124    0.000    0.151    0.000 connection.py:858(_checkpid)

Look at all that time recv’ing!

The problematic piece of application code looked something like this:

for k, v in some_things.iteritems():
    increment_by = 1
    redis_db.zincrby(ZSET, k, increment_by)

It just loops over a dictionary and for each key (k) increments its score by one within the sorted set it’s stored in. So, for dictionaries with many keys, the repeated calls to ZINCRBY really began to add up, as the profiler output indicated. Obviously a quick way to reduce the amount of time spent doing this needless work would would be to push it to the server side, and that’s where Lua comes in.

Since version 2.6.0, Redis has shipped with a built-in Lua interpreter. From what I can tell it was introduced partly as the maintainer’s way of deflecting new feature requests for obscure use cases (“Write your own Lua script to do that, I won’t add it to Redis.”) For more information, see the official documentation and a short introduction to writing Lua scripts for Redis.

Replacing the loop above with a Lua script was pretty straightforward. You can simply define the script as a string, register the script with the server, and then get to using it. As Lua script execution is blocking, it was necessary to add a way to break the items to be incremented into reasonable chunks, so as to avoid asking the server to increment hundreds of thousands of scores at once. I also tested out replacing the original loop with Redis’ pipeline functionality, but the Lua version won out in the end.

lua = """
for i, k in pairs(ARGV) do
  redis.call("ZINCRBY", KEYS[1], 1, k)
end"""
increment = redis_db.register_script(lua)
for chunk in generate_chunk_from_iterable(some_things.iterkeys(), CHUNK_SIZE):
    increment(keys=[ZSET], args=chunk)

The results from the profiler were pretty convincing compared to the previous method:

         1048325 function calls in 1.851 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.643    0.129    0.643    0.129 {method 'recv' of '_socket.socket' objects}
      615    0.295    0.000    0.295    0.000 {method 'sendall' of '_socket.socket' objects}
        5    0.252    0.050    0.634    0.127 connection.py:593(pack_command)
   101999    0.131    0.000    0.272    0.000 connection.py:577(encode)
        1    0.125    0.125    0.125    0.125 {zlib.compress}
        1    0.112    0.112    0.112    0.112 {cPickle.dumps}
   347687    0.100    0.000    0.100    0.000 {isinstance}
   102004    0.064    0.000    0.064    0.000 {method 'join' of 'str' objects}
    35914    0.041    0.000    0.041    0.000 {method 'encode' of 'unicode' objects}
   305707    0.032    0.000    0.032    0.000 {len}
   102013    0.014    0.000    0.014    0.000 _compat.py:21()
    50990    0.011    0.000    0.011    0.000 {round}
        1    0.007    0.007    1.504    1.504 client.py:2636(__call__)
        5    0.005    0.001    1.591    0.318 client.py:558(execute_command)

So remember, when inserting or updating many values, avoid making a lot of round-trips, because the network is always slower than RAM.