Parent-child vs Nested Joins in Elasticsearch

You have two main options for modeling complex relationships between documents in Elasticsearch: parent-child joins and nested.

Parent-child vs Nested Joins in Elasticsearch

You have two main options for modeling complex relationships between documents in Elasticsearch: parent-child joins and nested. These can be useful for representing data that has hierarchical or nested structures, such as comments on a post or answers to a question. These joins allow you to query and filter related documents together in a single request, which can be more efficient than querying and filtering them individually.

Parent-child Joins

A parent-child join in Elasticsearch can be useful when you have documents that are logically related, but are stored as separate documents in Elasticsearch. This can be useful because it allows you to maintain the integrity of the parent and child documents, while still being able to search and query them together. For example, if you are using Elasticsearch to store information about articles and their associated comments from the New York Times, you could use a parent-child relationship to ensure that the comments for a particular article are always returned along with the article itself. This can be more efficient and easier to work with than using nested documents, which can be more difficult to query and maintain.

Here is an example of using a parent-child join in Elasticsearch to join users to their comments. First, we would create two separate Elasticsearch types for the users and their comments, like this (note these examples assume you are using at least Elasticsearch 7 or higher):

PUT /my_index
{
  "mappings": {
    "properties": {
      "user": {
        "type": "text"
      },
      "weapon": {
        "type": "text"
      },
      "relation_type": {
        "type": "join",
        "eager_global_ordinals": true,
        "relations": {
          "parent": "child"
        }
      }
    }
  }
}

Next, insert a parent and their child, like this:

PUT /my_index/_doc/1?routing=Kratos
{
  "user": "Kratos",
  "weapon": "leviathan ax",
  "relation_type": {
    "name": "parent"
  }
}

PUT /my_index/_doc/2?routing=Kratos
{
  "user": "Atreus",
  "weapon": "bow",
  "relation_type": {
    "name": "child",
    "parent": 1
  }
}

The routing parameter ensures that the parent and child document are stored on the same Elasticsearch shard.

Finally, we can use a has_parent query to find child documents whose parent's weapon is "leviathan ax":

GET /my_index/_search 
{
   "query":{
      "has_parent":{
         "parent_type":"parent",
         "query":{
            "match":{
               "weapon":"leviathan ax"
            }
         }
      }
   }
}

You can also use the has_child query to search for child documents with a given weapon. Grandchild documents are also possible, but not recommended due to performance reasons.

Disadvantages of Parent-Child Joins

  • Parent-child uses global ordinals to speed up joins. Global ordinals are built lazily by default and can take a while to build, and could slow down queries. You can enable eager_global_ordinals to build global ordinals at refresh time instead.
  • Child documents can't have more than one parent.
  • It is impossible to tell which child document matched a “has_child” query, just that one of the docs of the returned parent matched the criteria.
  • There is extra memory overhead, since ES has to maintain a “join” list in memory.

Nested Joins

The nested data type in Elasticsearch is used to store arrays of objects in a single field, where each object is considered a nested document. This can be useful for storing data that has a complex structure, such as the comments on an article or the details of a product and its associated components. Instead of storing the nested objects as separate documents, they can be stored as nested objects within the parent document, allowing you to query and filter the nested documents along with the parent document itself. This can make it easier to work with complex data structures, and can improve the performance of your Elasticsearch queries.

Here is an example of using a nested join in Elasticsearch to join users to their comments. First, we would create a single Elasticsearch type for the users and their weapons, like this:

PUT my_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "weapons": {
        "type": "nested"
      }
    }
  }
}

Next, we would index some users and their comments, like this:

PUT my_index/_doc/1
{
  "name": "kratos",
  "weapons": [
    {
      "ax": "leviathan ax"
    },
    {
      "sword": "blades of chaos"
    }
  ]
}

PUT my_index_2/_doc/1
{
  "name": "atreus",
  "weapons": [
    {
      "bow": "talon bow"
    }
  ]
}

Finally, we can use a nested query to search for users and their comments, like this:

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "weapons",
      "query": {
        "bool": {
          "must": [
            { "match": { "weapons.ax": "leviathan ax" }}
          ]
        }
      }
    }
  }
}

This query will return the user with the user "kratos" and all of their weapons that contain the word "leviathan ax".

It is possible to have multiple levels of nested documents

Disadvantages of Nested

  • Nesting can increase the complexity of your Elasticsearch queries, making them more difficult to write and maintain.
  • Each time you add, change, or remove a nested document, the entire document needs to be reindexed.
  • Searches against nested documents return the whole document, not just the nested document.
  • There is a default limit of 10000 on the number of nested JSON objects that a single document can contain across all nested types.
  • There is also a default max number of distinct nested mappings an index can have, set by index.mapping.nested_fields.limit.

Conclusions

Whether you go with parent-child joins or nested joins ultimately depends on your use case. If the number of child documents per parent is very large, parent-child would probably work best. Parent-child is also optimal for when the parent or child documents are modified frequently. In most other cases, nested joins will be more performant.