Benchmarking Search Performance: Elasticsearch vs competitors

There are some new exciting players in the open-source search engines field. We decided to look at some of them closely to find out how they stack against the Elasticsearch - both by feature set and performance.

Candidates:

Features

Feature

Elasticsearch

RediSearch

PostgreSQL

TypeSense

MeiliSearch

Storage

Disk

RAM + snapshots

Disk

RAM + snapshots

RAM + snapshots

Distributed

Primary/replica

RAFT-based

Primary/replica

RAFT-based

NO

Replicated

+

NO

+

NO

NO

Languages

latin + cjk + cyrillic + arabic, armenian, basque, bengali, brazilian, greek, hindi, indonesian, persian, sorani, thai

latin + arabic, russian, chinese

latin + arabic


all whitespace-separated

all whitespace-separated + kanji

Typo Tolerance

yes, might get slow

+

NO

+

+

Boosting

+

+

+

+

NO

Exact Search

+

+

+

NO

NO

Synonyms

+

+

+

+

+

Known limitations

Elasticsearch

  • Becomes unstable above ~1000 indices (or 20k shards) per cluster

TypeSense

  • Storage size limited by available RAM

Source: https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/

Meilisearch

  • The maximum number of terms taken into account for each search query is 10
  • Maximum database size is 100GiB (can be changed per instance)
  • Up to 200 indexes
  • Maximum of 1000 words per field

Source: https://docs.meilisearch.com/reference/features/known_limitations.html#design-limitations

Benchmark

Dataset

Name: enwiki-20210720-abstract.xml
Description and Source: Date: July 20, 2021
Docs: 6.3M
XML size: 6.0 GB

Query words are chosen randomly from the 1000 most popular English words dataset.

Environment

2x General Purpose / 32 GB / 8 vCPUs DigitalOcean droplets (one for load generation + one for storage).

Results

Indexing time

For indexing we only counted the time our indexer spent in requests to the search backend. Elasticsearch, PostgreSQL and Typesense show very similar performance here, while RediSearch is ~2x slower; this result strangely contradicts the RedisLabs benchmark results so the set up might be suboptimal here. On the other hand, Meilisearch really shines here being almost 7 times faster than the others.

Query latency

Again, RediSearch is a slower outlier here for all queries, and again RedisLabs got different results. Another surprising outlier is the "three-word" query on Typesense, taking enormous amount of time on average for some reason. Meilisearch displayed pretty solid performance, especially for prefix and typo queries.

We also used zeroes for unsupported types of queries but RediSearch got its timings into the under 1 ms (!) zone for "exact phrase" and "three word AND" queries.

Raw numbers

Benchmark

Elasticsearch

RediSearch

PostgreSQL

TypeSense

MeiliSearch

Indexing






- time

268

516

290

272

42 (async)

- throughput

23644

12267

21827

23258

150284

1 Word Query

16.14

16.81

69.89

16.04

6.73

3 Word Query

4.07

0.95

2.61

224.36

11.57

OR Query

20.69

45.86

2.48

N/A

N/A

Exact Phrase Query

3.16

0.64

9.85

N/A

N/A

1 Word Prefix Query

7.76

36.98

9.22

6.75

6.18

Typo Query

19.81

58.17

N/A

14.61

5.84

Takeaways

  • Elasticsearch is still the king, offering solid performance for indexing and all types of queries.
  • RediSearch has so-so indexing performance and RedisLabs try hard to upsell their cloud solution so documentation is subpar too but it can give sub-millisecond latency for some types of queries.
  • PostgreSQL has a weird spike for simple one-word query performance and interface is quite complex though it might be a decent solution if you already have a Postgres database.
  • TypeSense has a good feature set and performance generally but with a strange spike at multi-word queries.
  • MeiliSearch seemingly great performance was caused by a test error, and we weren't able to complete the test with a proper set up.

Update: Meilisearch and Typesense results

Jason Bosco from Typesense reached out to us regarding the weird slow outlier results with 3-word queries and recommended to re-run that test with parameter drop_tokens_threshold=1 though the results are similar (200+ ms). We've also tried drop_tokens_threshold=0 effectively turning it into OR search with way better performance.

So the slow down is probably caused by the fact that we're picking 3 random English words for the query and there is no documents containing all three so Typesense starts dropping words unless it gets something, and this process is not very fast.

Jason also noted that seemingly fast Meilisearch indexing was actually caused by the index requests being asynchronous. We've updated the test to wait for all indexing tasks to complete but they're taking extremely long time so we'll need to look closer into how Meilisearch works under the hood.

Gigasearch is a team of Elasticsearch consultants and engineers with experience deploying and tuning petabyte-scale clusters. Contact us today!