The count is wrong – distributed facets on multiple shards

We discovered during the implementation of a faceted search the following issue:

The count of a facet was 14 but when you add that facet to your search the count of results was 15. WTF?!

The index is quite simple: Just containing some none indexed field and one filter field containing all the filters. On that filter field the facet was added:

FACET_LIMIT was 10 at the beginning, and that is the issue:

Because what happens is that elasticsearch gets the top 10 facets from each shard. For example the value „A“ of that facet  has a count of 14 and is in the top 10 of shard 01 but on shard 02 it has the count of 1 is not in the top 10. So the total count of that facet is 14 but if you do a real search for it, all shards a searched and a count of 15 is returned.

There is also an issue about that.

How to come over this:

  1. Just use one shard
    => Bad choice because of scalability and therfore only possible for small indices
  2. Raise the FACET_LIMIT
    => We did that because the index is small. But could also be a performance issue. We moved that to 3000 without any performance decrease.
  3. Change index format to avoid fields that have too much different values that are used for a facet
    => Probably the best choice
  4. Live with it 😉

 

GD Star Rating
loading...

Kommentar verfassen