query consolidation in elasticsearch

In my last post on a simple way to improve elasticsearch queries I promised a follow up for another way to optimize queries.

This approach didn’t come with the same level of improvement of the order of magnitude from the previous post, but it still offers some benefits.

Once again, I was working on improving my rough first shot of working code. In this case the app I was working on was displaying the search results I mentioned last time, but it also was pulling various facets for display as well.

By the time everything was rendered I had issued somewhere between 12 and 15 calls or queries. Some of these were necessary during authentication or to handle capturing data necessary for the actual query. However there was a clear opportunity for improvement.

My focus was on a couple of sets of queries in particular. The first was a call to capture statistics for a field which would then be used to set up the call for actual facet calls. (Side note: Yep, the facets are going away and are being replaced by aggregations. I’ll likely share some notes on this when I’m done with making that change).

{
    "facets": {
       "date": {
          "statistical": {
             "field": "date"
          }
       }
    }
}

My code has a few of those calls for various numeric fields such as date, size etc.

The other set of queries to focus on was the retrieval for the actual facets.

{
    "facets": {
       "tags": {
          "terms": {
             "field": "tags",
             "size": 10
          }
       }
    }
}

Now the first set of stats related facets are actually used to dynamically create the buckets for some of the actual facet calls. That still lets me combine the first group into one call and the second group into another.

So, I basically end up with two calls to elasticsearch. The first to grab the statistics facets and the second for the facets that are actually used in the application for display.

None, the less rather than issuing a call for each one independently, we can combine them. Like this:

{
    "facets": {
       "date": {
          "statistical": {
             "field": "date"
          }
       },
       "size": {
          "statistical": {
             "field": "size"
          }
       }
    }
}

and then one more call which also includes the actual query:

{
   "query": {
      "query_string": {
         "default_field": "body",
         "query": "test"
      }
   },
   "fields": [
      "title"
   ],
   "facets": {
      "tags": {
         "terms": {
            "field": "tags",
            "size": 10
         }
      },
      "folder": {
         "terms": {
            "field": "folder",
            "size": 10
         }
      }
   }
}

You’ll notice that I’m also just returning the field I need for display as described in the last post.

While this approach doesn’t really reduce the amount of work Elasticsearch has to perform, it reduces the number of individual calls that need to be made. That means that most of the improvement is in the number of calls as well as network roundtrips that need to take place. The later will likley have a bigger impact if the calls made are in sequentially rather than asynchronously. Regardless it does offer some improvement from my experience so far.

\@matthias

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s