Imagine boosting your lexical (BM25) search results without relying on LLMs or managing complex vector embedding pipelines. What if these improved results could even outperform semantic search using state-of-the-art SentenceTransformers vector embeddings*? Intrigued? Read on!

TL;DR

You can improve the relevancy of your search results by integrating Cohere’s powerful Rerank model directly into your Opensearch or Elasticsearch search pipeline – no machine learning experience required. What’s more, you can do it in as little as 10 minutes. This guide will walk you through the process of setting this up with Opensearch and Cohere, including a Colab notebook, and retrieval performance benchmarks.

Reranking at the final stage of the search flow for improved relevance. Source: Cohere

Prerequisites

You’ll need:

A Cohere api key
Some familiarity with Elasticsearch or Opensearch
Python familiarity helps, but not required
Remote access to your Opensearch cluster (or just set up a new one with test data).

Feel free to copy the code to your own IDE or notebook environment, but it’ll definitely be quicker and easier to run through the full example in our google colab notebook. From there, it should be pretty straightforward to port over to your own environment.

Basic Steps

Register a Cohere Rerank model
Configure a reranking (search) pipeline
Use the search pipeline

You can actually see these steps in the official Opensearch documentation here, but we’ve gone ahead and taken their examples a step further by laying it all out in our example notebook.

Register a Cohere Rerank Model

First, we create the ML Connector. We’ll set up a basic http request in python, specifying the ml connector /_create endpoint, and a json body with our Cohere rerank model information, like name, api key, and the request endpoint and body to send to the rerank api:

import requests

url = "http://localhost:9200/_plugins/_ml/connectors/_create"
headers = {
    'Content-Type': 'application/json'
}
data = {
    "name": "cohere-rerank",
    "description": "The connector to Cohere reranker model",
    "version": "1",
    "protocol": "http",
    "credential": {
        "cohere_key": cohere_api_key
    },
    "parameters": {
        "model": "rerank-english-v2.0"
    },
    "actions": [
        {
            "action_type": "predict",
            "method": "POST",
            "url": "https://api.cohere.ai/v1/rerank",
            "headers": {
                "Authorization": "Bearer ${credential.cohere_key}"
            },
            "request_body": "{ \"documents\": ${parameters.documents}, \"query\": \"${parameters.query}\", \"model\": \"${parameters.model}\", \"top_n\": ${parameters.top_n} }",
            "pre_process_function": "connector.pre_process.cohere.rerank",
            "post_process_function": "connector.post_process.cohere.rerank"
        }
    ]
}

response = requests.post(url, headers=headers, json=data)
connector_id = response.json()['connector_id']

We use an http request here because registering an external ML model doesn’t seem to be supported in the python ml client at this time.

Next, we register and deploy our model:

url = "http://localhost:9200/_plugins/_ml/models/_register?deploy=true"
headers = {
    'Content-Type': 'application/json'
}
data = {
    "name": "cohere rerank model",
    "function_name": "remote",
    "description": "test rerank model",
    "connector_id": connector_id
}

response = requests.post(url, headers=headers, json=data)

task_id = response.json()['task_id']
model_id = response.json()['model_id']

Finally, we can test our model with the /_predict endpoint:

import json

url = "http://localhost:9200/_plugins/_ml/models/" + model_id + "/_predict"
headers = {
    'Content-Type': 'application/json'
}
data = {
  "parameters": {
    "query": "Who is the main character of Star Wars?",
    "documents": [
      "Jar-Jar Binks is a comical, possibly secret sith character in Star Wars.",
      "Darth Vader, aka Anakin Skywalker is the main antagonist of the original Star Wars trilogy.",
      "Luke Skywalker is the main protagonist of the original Star Wars trilogy.",
      "Emperor Palpatine is arguably the main antogonist as he is the main sith lord."
    ],
    "top_n": 4
  }
}

response = requests.post(url, headers=headers, json=data)

Configure a Reranking (Search) Pipeline

We need to tell the opensearch rerank query that we have a special reranking process for it to use. We create a search_pipeline with our cohere model_id and body field parameters (title and txt.)

url = "http://localhost:9200/_search/pipeline/rerank_pipeline_cohere"
headers = {
    'Content-Type': 'application/json'
}
data = {
    "description": "Pipeline for reranking with Cohere Rerank model",
    "response_processors": [
        {
            "rerank": {
                "ml_opensearch": {
                    "model_id": model_id
                },
                "context": {
                    "document_fields": ["title", "txt"],
                }
            }
        }
    ]
}

response = requests.put(url, headers=headers, json=data)

Use the Search Pipeline

Now, when we submit a search request that we want reranked by Cohere, we’ll need to specify the search_pipeline we made for Cohere reranking. This can be done on the individual search request, or a default search_pipeline can be set for an index.

query_text = 'A total of 1,000 people in the UK are asymptomatic carriers of vCJD infection.'

res = client.search(
    index="scifact",
    search_pipeline="rerank_pipeline_cohere",
    body={
    "query": {
        "multi_match": {
            "query": query_text,
            "type": "best_fields",
            "fields": [
                "title",
                "txt"
            ],
            "tie_breaker": 0.5
        }
    },
    "size": 10,
    "ext": {
        "rerank": {
            "query_context": {
                "query_text": query_text
            }
        }
    }
}
)

Opensearch with Cohere Rerank Results

{
    "id": "13734012",
    "score": 0.9920002,
    "title": "Prevalent abnormal prion protein in human appendixes after bovine spongiform encephalopathy epizootic: large scale survey",
    "text": "OBJECTIVES To carry out a further survey of archived appendix samples to understand..."
  },
  {
    "id": "18617259",
    "score": 0.9896318,
    "title": "Research Letters",
    "text": "We report a case of preclinical variant Creutzfeldt-Jakob disease..."
  },
  {
    "id": "11349166",
    "score": 0.95447797,
    "title": "Lack of evidence of transfusion transmission of Creutzfeldt-Jakob disease in a US surveillance study.",
    "text": "BACKGROUND Since 2004, several reported transfusion transmissions of variant Creutzfeldt-Jakob disease..."
  }

And finally, we can compare with a non-reranked search:

res = client.search(
    index="scifact",
    body={
    "query": {
        "multi_match": {
            "query": query_text,
            "type": "best_fields",
            "fields": [
                "title",
                "txt"
            ],
            "tie_breaker": 0.5
        }
    },
    "size": 10
}
)

Regular Opensearch BM25 Results

{
    "id": "18617259",
    "score": 22.933357,
    "title": "Research Letters",
    "text": "We report a case of preclinical variant Creutzfeldt-Jakob disease..."
  },
  {
    "id": "13734012",
    "score": 18.917074,
    "title": "Prevalent abnormal prion protein in human appendixes after bovine spongiform encephalopathy epizootic: large scale survey",
    "text": "OBJECTIVES To carry out a further survey of archived appendix samples..."
  },
  {
    "id": "15648443",
    "score": 16.957115,
    "title": "Long-term effect of aspirin on cancer risk in carriers of hereditary colorectal cancer: an analysis from the CAPP2 randomised controlled trial",
    "text": "BACKGROUND Observational studies report reduced colorectal cancer in regular aspirin..."
  }

Our very scientific example query comes directly from the scifi dataset: A total of 1,000 people in the UK are asymptomatic carriers of vCJD infection. The expected result is the document with id 13734012 and title Prevalent abnormal prion protein in human appendixes after bovine spongiform encephalopathy epizootic: large scale survey.

This example shows how our document ranked second in the standard search but first in the reranked results, showcasing an improvement in relevance. While this single instance is anecdotal, we conducted a more comprehensive analysis using Benchmarking-IR (BEIR), an industry standard for benchmarking information retrieval. We calculated several relevance metrics across all queries in the sci-fi dataset, revealing consistent improvements across all scores, as illustrated below.

Performance Metrics

Relevance (Normalized Discounted Cumulative Gain)

Relevance	SBERT	TAS-B	BM25 + CE	BM25	BM25 + Cohere
NDCG@1	0.42333	0.44667	0.5733	0.57667	0.62667
NDCG@3	0.48416	0.50432	0.6314	0.63658	0.69593
NDCG@5	0.48416	0.52853	0.652	0.66524	0.7141
NDCG@10	0.53789	0.55485	0.672	0.69064	0.73495
NDCG@100	0.57592	0.58717	0.678	0.71337	0.75241

SBERT refers to an exact k-nn match using the sbert msmarco-distilbert-base-v3 model.
TAS-B is exact match with sbert msmarco-distilbert-base-tas-b model.
BM25 + CE is the base bm25 results reranked with the ms-marco-electra-base SBERT cross-encoder.
BM25 is the base performance of a multi_match query in Opensearch.
Finally, BM25 + Cohere is the performance of Opensearch when using the Cohere rerank pipeline.

Latency

Latency	BM25	BM25 + Cohere	BM25 + CE CPU
Average	14.26ms	214.03ms	8745.06ms
P50	13.06ms	150.35ms	8527.61ms
P90	19.73ms	406.88ms	12713.91ms
P99	31.51ms	870.71ms	15520.40ms

To test latency, we looped over the 300 queries in the scifi dataset and measured how long each request took with the corresponding retrieval method (BM25, BM25 + Cohere, BM25 + CE).

As you can see, relevance (as measured by nDCG) is improved at the cost of some increased latency. This is common when using cross-encoder models for reranking. However, the Cohere reranking pipeline performed significantly better than our test with self-hosting a cross-encoder model, where latency increased substantially without proper processing power.

In our self-hosting tests, we ran the sbert cross-encoder on Google Colab’s CPU and you can see the latency reached unacceptable levels. To improve the performance, the production environment would likely need many high performance CPUs or better yet, GPUs.

It's also worth noting that at the time of this blog's release Cohere has released their Rerank 3 Nimble model, which claims to be 3x faster than Rerank 3. However, it is only available for on AWS SageMaker and on-premise deployments.

Conclusion

There you have it. If you were following along, you were able to set up an Opensearch integration with the Cohere Rerank api in a very short amount of time, increasing the relevance of your search results with little effort! As always, if you’d like to talk more about search relevance, reranking and other ai search solutions, reach out to Gigasearch!

*According to BEIR Heterogenous Benchmarking for Information Retrieval, cross-encoders outperform bi-encoder semantic search approaches for relevance.