simple way to improve elasticsearch queries

We use ElasticSearch for some things. I personally have been enjoying working with it as part of a new tool we are building. I’ve learned a couple of things from a querying perspective.

First, I could say a lot about how impressed I am with ElasticSearch from an operations perspective. Out of the box it runs extremely well, but I’ll save that for another post. Here I’ll talk about some rather simple ideas to improve the querying of ElasticSearch.

When developing I often start very basic. It could even be described as simplistic. The first shot is generally not very efficient, but it helps to quickly determine if an idea is workable. This is what I’ve recently done with some code querying ElasticSearch.

The first simple performance improvement was around generating a display of the search results. To get things going quick, I issued the query and grabbed the results. By default ElasticSearch returns the entire document in the _source field. The simple query might look like this:

{
  "query": {
    "match_all": {}
  }
}

The returned results then include the _source field and might look like this

{
  "_index": "test",
  "_type": "doc",
  "_id": "20140806",
  "_score": 1,
  "_source": {
    "title": "some title",
    "body": "the quick brown fox jumps over the lazy dog"
  }
}

My code would then go through the array and grab the title field from the _source for display in the result list. That worked ok, but seemed slow. (Full disclosure: my documents were quite a bit biggger then the simple example above)

Now since I didn’t really need the entire document just to display the title, the obvious choice is to just get the necessary data. Elasticsearch makes this easy via the :

{
  "query": {
    "match_all": {}
  },
  "fields": [
    "title"
  ]
}

That will return something like the following in the hits array:

{
  "_index": "test",
  "_type": "doc",
  "_id": "20140806",
  "_score": 1,
  "fields": {
    "title": [ "some title" ]
  }
}

That lets me skip the retrieval of potentially large chunks of data. The results were quite impressive in my use case. The run time of the queries and display of results dropped by an order of magnitude. Again, this is likely due to the much larger documents I was actually working with. None the less it is a good example of only retrieving the necessary data rather than issuing what amounts to a SELECT * in SQL terms.

The other performance improvement was around consolidating queries, but I’ll save that for a future post.

\@matthias

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s