Search at TheTaxBook: From Pain Point to Selling Point

Learn how Gigasearch helped improve relevance, configure Elasticsearch for production, and boosted search from a pain point to a selling point.

Tax Materials, Inc. publishers of TheTaxBook is a family-owned company based in Minneapolis that created the industry renowned publication known as "TheTaxBook", the most authoritative and up-to-date tax guide available. The company’s products are used and relied upon by professional tax accountants across the country. Along with the printed materials, they offer an online version of the books at thetaxbook.net, with a product called the WebLibary.

I now feel very confident in our search implementation. Our customers are going to get the value they were looking for with our product, and the search drives all of that. - Sam Meyer, VP of Operations at TheTaxBook

"Think about search." read a sign above Sam's desk. Sam Meyer, VP of Operations at TheTaxBook, knew that he needed to address the poor quality of TheTaxBook WebLibrary's search engine. The most common refrain among his customers was that they wanted an improved search engine. Search on the site was powered by Sphinx and at times seemed to perform not much better than issuing an SQL query. Features that users came to expect from search engines like Google, such as auto complete, synonym matching, and relevant results, were almost entirely missing.

"Search was somewhat of a black eye on our overall product offering."

Sam and his team began migrating to Elasticsearch in early 2019, with the hope of completing the migration before the next tax season. Though they were able to make decent progress, there were several issues that they were not able to overcome, despite going to training for Elasticsearch.  Auto complete was so slow that it was unusable, taking 5-6 seconds to load after typing each character. Irrelevant documents were still being served, and synonyms were still not supported. They also did not know how to configure the Elasticsearch infrastructure to be able to support and scale with their use case.

"We knew enough to be dangerous, but not enough to get the full job done."

Though Sam preferred to work with local consultants to help them across the finish line, it became obvious that the depth of experience required could not be found locally. He also knew that Elastic consulting would be biased in favor of their service offerings, and he preferred a neutral perspective. Sam and his team were immediately impressed with the depth of knowledge Gigasearch demonstrated even on the first call.

"We were sensitive to cost, but Gigasearch gave us a clear scope of work and estimate that gave us a clear picture of what to expect. They hit the nail on the head with the estimate."

Targeted Improvements

Gigasearch performed a comprehensive review of TheTaxBook's code base, focusing on the queries generated by the PHP application. One key issue was the slow autocomplete on their search bar, which took several seconds before returning suggestions. TheTaxBook was using edge_ngrams for autocomplete, but was unsure if it was causing the slow performance. Gigasearch confirmed that edge_ngrams was the correct approach, and the issue lay elsewhere. The application was not waiting for the user to finish typing before submitting the request to the backend, leading to a request on every letter typed. The autocomplete query also had a size parameter of 500, returning that many documents when only 21 results were shown. Reducing the size parameter down would thus improve search speeds.

Another key issue was relevance. TheTaxBook WebLibrary contains documents from their proprietary books, as well as up-to-date government tax documents scraped from various government websites. Content from some of the government documents at times would display ahead of their own content in search results. While those documents should be searchable, content from TheTaxBook needed to be within the top results every time. The simplest solution for this was using a boosting query, deboosting government tax documents as needed.

{
    "query": {
        "boosting": {
            "positive": {their current query},
            "negative": {
                "term": {
                	"weblibrary_books_type": "govdoc"
    			}
    		},
    		"negative_boost": 0.99
    	}
    }
}
Boosting query with a negative boost on govdoc

With the above query, the first government document result for the term "depreciation" went from the 10th result to the 22nd. Addressing these issues, as well as generally providing feedback on their queries, dramatically improved the search experience. Gigasearch provided ongoing support after the initial recommendations to ensure that Elasticsearch was working properly for TheTaxBook WebLibrary product.

"After the first round of feedback, my main developer was pleased and felt confident that we would be able to finish the implementation before the next tax season. He's very difficult to please."

Painless Infrastructure

Given the small size of the team and relatively small data volume, Gigasearch recommended Elastic Cloud to TheTaxBook. A managed offering is often the most resource effective path for a small team, because most of the undifferentiated work is handled by the service, such as backups, monitoring, provisioning, etc. Elastic Cloud was selected because Kibana monitoring works out of the box, and the cost difference with AWS Elasticsearch is negligible at their scale. Gigasearch helped select the right instance type, data node counts, monitoring, and HA set up for their use case.

"We haven't had a single load related issue for the last year."

Search With Confidence

With Gigasearch's help, TheTaxBook was able to revamp it's search engine in time for the 2020 tax season. Search is now no longer being mentioned as an issue in customer surveys. Metrics on search performance showed that customers clicked on one of the top 5 results 87% of the time, and one of the top 10 results 95% of the time, a marked improvement in relevance.  Sam says, "I now feel very confident in our search implementation. Our customers are going to get the value they were looking for with our product, and the search drives all of that."

TheTaxBook was also able to create a major partnership with a large tax preparation software company in the industry. The partnership includes a direct integration to the WebLibrary from  their tax preparation product using the search engine as a key factor in the setup.

"Our improved search engine was a driving force in our ability to integrate with one of the biggest tax prep software companies in the industry. Without it, we wouldn't have even been able to explore that opportunity."

Gigasearch is a team of Elasticsearch consultants and engineers with experience tuning search relevance at scale. Contact us today!