Vendasta: Upgrading Elasticsearch from 2 to 7 on ECK

Vendasta provides an end-to-end ecommerce platform to experts who deliver digital products and services to local businesses worldwide. They provide a suite of tools to enable marketing and advertising companies to interact with other local small and medium sized businesses. Vendasta’s tools provide SMB clients with a delightful digital experience.

Market, sell, bill, and fulfill the services you provide to local businesses—all from the Vendasta platform

Prior to working with Gigasearch, Vendasta was using Elasticsearch 2.4. The staff that had originally set up the cluster had since left. The cluster was unstable, causing frequent outages. They wanted to upgrade to the latest version of Elasticsearch using the ECK Kubernetes operator, but did not have the expertise in-house to confidently move forward with the project. After the project with Gigasearch, they were able to deploy a stable Elasticsearch 7 cluster in ECK, and address key concerns like parent-child mappings and noisy neighbor issues.

The following was an interview with Ben Stolz, SRE team lead at Vendasta.

What is your Elasticsearch use case?

Elasticsearch is a secondary index in our platform. There is a primary index that all of Vendasta uses, and we have tools that allow us to export to other secondary indices like SQL and Elasticsearch. Data in Elasticsearch can be anything from email campaigns, customer information, and customer reviews. We use both full-text search as well as filtering and aggregations. The SRE team is responsible for deploying and maintaining our Elasticsearch infrastructure.

What problems were you facing with your Elasticsearch deployment?
Vendasta has grown a lot over the last 6 years. The existing Elasticsearch cluster was running on old infrastructure. How and why it was set up was never documented. Due to this, our development teams were using Elasticsearch in suboptimal ways.

Kubernetes is a well-known technology for  our team. When ECK came along, we saw it as an opportunity to manage Elasticsearch using Kubernetes and GitOps principles. We also wanted to upgrade to the latest version of Elasticsearch, as well as fix some of the bad practices that were leading to stability issues.

The problem was that we weren’t Elasticsearch experts. We could spend the time to become experts, but we wanted to move the needle quickly and accelerate the upgrade. At that point, we started looking for Elasticsearch consultants. We didn’t pursue anything further after the first call with Gigasearch. It was plainly obvious that the expertise and experience were there.

How did these problems impact your business?
Several of our products rely on Elasticsearch, so removing it wasn’t an option. Our Elasticsearch cluster would go down a lot due to CPU being overloaded on data nodes, and as a result, a good portion of our platform would stop working correctly. For example, managing accounts was not possible when the Elasticsearch cluster was down. The blast radius was extremely high when Elasticsearch went down, so it was important to us that our new Elasticsearch cluster was highly available.

What were you looking for in a solution?
We wanted our Elasticsearch deployment to be highly available. Additionally, the ability to scale it with the company, and upgrade it as needed was also important. With our old cluster, we did not have a great maintenance & upgrade plan, leading to increasing headaches as we scaled. We didn’t want to repeat that mistake with our new cluster.

What was most important to you when looking for an elasticsearch consultant?
Flexibility. We weren’t exactly sure about all of our needs, and wanted to retain some control over the implementation so we could own it moving forward. The package we were able to put together with Gigasearch suited our needs.

We also spoke with Elastic, but couldn’t get a straightforward answer on pricing. They didn’t seem as open to helping with our specific issues or desires. The call with Elastic had a project manager and success specialist, who weren’t the right audience. We needed a contract in place before a specialist joined.We felt they were dealing with a big company, as opposed to a boutique firm.

The deciding factor for Gigasearch was the initial first impression. I think we even had a couple answers before we even signed any agreement. The pricing model for Gigasearch was not confusing, it couldn’t be more clear. With Gigasearch, we felt like we were talking with friends down the street versus this big business with a checklist.

How was your experience with Gigasearch?
Gigasearch set up a system that included shared documentation, a shared Slack channel for ad-hoc questions, and weekly calls. Weekly calls were a really great way to touch base on how everything was going. Before working with Gigasearch, our lack of experience with Elasticsearch resulted in inertia that stopped projects dead in their tracks. It’s hard to make decisions confidently with so many unknowns. The meetings with Gigasearch motivated us to make progress between meetings. Lots of experts joined our recurring meetings. Usually when many people join a call, only one person does the talking, but we found that everyone who joined contributed in a meaningful way.

The shared Slack channel was my favorite part. The interactions in slack were always quick and responsive to my surprise, and always very helpful. It was great to have someone we could tap on the shoulder and get their input. We were also able to arrange a secondary phase for additional support. We didn’t set out to have that, but the high quality help made it a good move. It was a reasonably priced insurance policy.

Everyone we worked with from Gigasearch was a professional. They all had in-depth knowledge about Elasticsearch backed up by experience using it at scale. Observing conversations between some of our more technical team members, and Gigasearch was interesting. The interactions were short and sweet. Ultimately it was a question, a sentence or two back and forth, and we had what we needed to continue. That holds true for our weekly discussions as well. The feedback and guidance we got was high quality, and it was always professional, in a way that always made us feel comfortable.

What were the key results?
I gave a presentation to all of Vendasta last week, and so I was specifically looking for metrics of success. The biggest thing for me was that compared to the old cluster, we were paged 0 times on the new cluster. We have never had downtime with the new cluster. That’s an immediate success metric for my team, because it means my team is less burnt out and more engaged at work. We were also seeing latencies of 10s of seconds on the old cluster, compared to sub second latencies on the new cluster.

The new cluster also costs $5k per month, compared to $20k per month for the old cluster. Though we anticipate some growth, it’s still saving a significant amount of money. It’s a really good move in the right direction.