Hiero7 is a cyber security company protecting clients from hundreds-of-gigabytes DDoS attacks and other threats. They also offer a CDN with PoPs in the Chinese market, increasing content delivery speeds in the region. A key product is access log monitoring for their CDN customers, ingesting volumes on the order of billions a day. Their custom monitoring dashboards, powered in part by Elasticsearch, offers customers great visibility into all aspects of traffic flow.
"From this engagement with Gigasearch, we learned best practices that would allow us to scale without uncertainty. Our dashboards now load very quickly, greatly improving our product. We're now confident that we are using our resources efficiently." — Tommy Lin, Tech Lead at Hiero7
A Starting Point
Elasticsearch met Hiero7’s analytical requirements right away. Elasticsearch is able to ingest thousands of documents a second, perform queries and aggregations in near-real-time, and is capable of scaling quickly with volume. The Elasticsearch query language is more friendly than complex systems like Hadoop. It comes with batteries included for local development and testing.
Hiero7’s needs were around access log monitoring and insights. UI dashboards were expected to quickly analyze and present insights over billions of logs. The initial Elasticsearch cluster needed to index beyond 40,000 documents per second, sustained. Everything had to run near-real-time and grow with new features, insights, and customers, as Hiero7 grew.
With the growth of Hiero7, Elasticsearch became a bottleneck. The customer base grew and the scale of data coming in exploded. Customer-facing dashboards started timing out. CPU load was high. And ingestion was falling behind. Though it was understood the need to scale up infrastructure, Hiero7 did not want to naively throw compute resources as the problem. These performance and reliability issues came from the many nuances of Elasticsearch. Hiero7 was in desperate need of experience, something they did not have the time to develop, as customers joined and volume grew.
Scaling up with Gigasearch
Gigasearch was one of three consulting teams Hiero7 was considering. They chose Gigasearch because, unlike the others, Gigasearch was able to provide value even in the initial free discovery phase, identifying a quick improvement immediately. Hiero7 was using a sub-optimal VPS instance for their given use case. The fix was immediate - upgrade to an EC2 I3.
Gigasearch conducted an in-depth analysis of Hiero7's cluster setup, index settings, mappings and queries. Some of the recommendations for Hiero7 included changing instance types, increasing the number of data nodes, reducing redundant queries, and optimizing mappings.
"I had not considered Kafka for supporting our multi-tenancy use case until Gigasearch recommended it. We were considering Redis, but Kafka is working very well. I wish I had heard about Gigasearch sooner." says Tommy.
In Depth - Focusing on the Reads
Elasticsearch is not a simple system, and as such, tuning a cluster requires not only intimate knowledge of its internals and functionality, but also of the data model, ingestion requirements, and read patterns. Gigasearch worked together with Hiero7 to determine how Elasticsearch is being used and where Hiero7’s priorities lie. What do customers want to see? Where are their expectations? These questions are vital to understanding what tradeoffs must be made with Elasticsearch tuning, and scaling.
Working with Hiero7, Gigasearch quickly identified how various front-end features were served by Elasticsearch and created a comprehensive plan to improve dashboard load time.
To perform this analysis, Gigasearch looked to the UI. The worst performing elements were selected, the underlying Elasticsearch queries extracted, and benchmarked. These queries were then executed using real-world time ranges and values against Hiero7’s running Elasticsearch cluster. The results were aggregated and a regression was performed, determining which queries required modification. This process yielded valuable insight and enabled Gigasearch to converge focus and offer a tailor-made solution immediately.
The approach was two-fold. First, to improve query performance and load times, various queries and aggregations were combined. The frontend would now be responsible for splitting the data and performing additional work. This, in effect, moved computation off of Elasticsearch and onto the browser, freeing constrained resources. The second key improvement was in the ingestion pipeline. By optimizing the data model and pre-computing certain fields with Logstash, the amount of work performed for large aggregations was reduced.
For example, to ask, “How many unique IP addresses have we seen?”, a cardinality aggregation is performed. Internally, Elasticsearch uses the HyperLogLog algorithm, hashing each value first in order to compute the count. By performing the IP address hash during ingestion, work is done during writes instead of reads. This tradeoff was perfect for Hiero7 - a small delay in data ingestion was unimportant, compared to faster load time on the UI.
"Rather than giving us generic feedback on how to address our issues, Gigasearch provided tailor-made feedback specific to our use case. They really listened to us and the engagement was a conversation, not a lecture." - Tommy Lin, Tech Lead at Hiero7
The Path Forward
After Hiero7 implemented the Gigasearch recommendations, their monitoring dashboard was able to load all data without timeouts. Their Logstash pipeline stopped dropping logs, and data was able to ingest in real time. Hiero7 now has a clear path forward, and will scale with confidence. Gigasearch is a team of Elasticsearch consultants and engineers with experience deploying petabyte-scale clusters. Contact us today!