There are some new exciting players in the open-source search engines field. We decided to look at some of them closely to find out how they stack against the Elasticsearch - both by feature set and performance.
- Elasticsearch - mature text search engine, based on Lucene
- RediSearch - full-text solution on top of Redis, built by RedisLabs
- Postgres FTS - full-text indices for Postgres
- TypeSense - open-source Algolia alternative
- MeiliSearch - open-source Algolia alternative
- Becomes unstable above ~1000 indices (or 20k shards) per cluster
- Storage size limited by available RAM
- The maximum number of terms taken into account for each search query is 10
- Maximum database size is 100GiB (can be changed per instance)
- Up to 200 indexes
- Maximum of 1000 words per field
Description and Source: Date: July 20, 2021
XML size: 6.0 GB
Query words are chosen randomly from the 1000 most popular English words dataset.
2x General Purpose / 32 GB / 8 vCPUs DigitalOcean droplets (one for load generation + one for storage).
For indexing we only counted the time our indexer spent in requests to the search backend. Elasticsearch, PostgreSQL and Typesense show very similar performance here, while RediSearch is ~2x slower; this result strangely contradicts the RedisLabs benchmark results so the set up might be suboptimal here. On the other hand, Meilisearch really shines here being almost 7 times faster than the others.
Again, RediSearch is a slower outlier here for all queries, and again RedisLabs got different results. Another surprising outlier is the "three-word" query on Typesense, taking enormous amount of time on average for some reason. Meilisearch displayed pretty solid performance, especially for prefix and typo queries.
We also used zeroes for unsupported types of queries but RediSearch got its timings into the under 1 ms (!) zone for "exact phrase" and "three word AND" queries.
- Elasticsearch is still the king, offering solid performance for indexing and all types of queries.
- RediSearch has so-so indexing performance and RedisLabs try hard to upsell their cloud solution so documentation is subpar too but it can give sub-millisecond latency for some types of queries.
- PostgreSQL has a weird spike for simple one-word query performance and interface is quite complex though it might be a decent solution if you already have a Postgres database.
- TypeSense has a good feature set and performance generally but with a strange spike at multi-word queries.
- MeiliSearch seemingly great performance was caused by a test error, and we weren't able to complete the test with a proper set up.
Update: Meilisearch and Typesense results
Jason Bosco from Typesense reached out to us regarding the weird slow outlier results with 3-word queries and recommended to re-run that test with parameter drop_tokens_threshold=1 though the results are similar (200+ ms). We've also tried drop_tokens_threshold=0 effectively turning it into OR search with way better performance.
So the slow down is probably caused by the fact that we're picking 3 random English words for the query and there is no documents containing all three so Typesense starts dropping words unless it gets something, and this process is not very fast.
Jason also noted that seemingly fast Meilisearch indexing was actually caused by the index requests being asynchronous. We've updated the test to wait for all indexing tasks to complete but they're taking extremely long time so we'll need to look closer into how Meilisearch works under the hood.
Gigasearch is a team of Elasticsearch consultants and engineers with experience deploying and tuning petabyte-scale clusters. Contact us today!