如何过滤top hits聚合结果

How to filter the top hits aggregation result

我正在对 elasticsearch 中的索引 products 进行一些查询。我在索引 products

中有以下文档
{ "product_name": "prod-1", "meta": [ { "key": "key1", "value": "value1" }, { "key": "key2", "value": "value2" } ] }
{ "product_name": "prod-2", "meta": [ { "key": "key1", "value": "value1" } ] }
{ "product_name": "prod-2", "meta": [ { "key": "key2", "value": "value2" } ] }
{ "product_name": "prod-3", "meta": [ { "key": "key2", "value": "value2" } ] }

我现在想要的是获取在元数组中同时具有 key1/value1key2/value2 但不一定在同一文档中的 product_name。例如,在上面的数据中 prod-1 在同一文档中同时包含 key1/value1key2/value2,因此我希望结果中包含 prod-1prod-2 也有 key1/value1key2/value2 但在不同的文档中。我也希望 prod-2 在结果中。 prod-3 即使跨文档也只有 key2/value2。所以,我不想在结果中得到 prod-3

我正在尝试以下方法

  1. 按产品名称分组
  2. 然后筛选聚合结果以检查每个产品同时具有 key1/value1key2/value2

我按 product_name 对它们进行分组,并按如下方式组合每个存储桶中的元字段

{
  "size": 0,
  "aggs": {
    "by_product": {
      "terms": {
        "field": "product_name"
      },
      "aggs": {
        "all_meta": {
          "top_hits": {
            "_source": {
              "includes": [
                "meta.key",
                "meta.value"
              ]
            }
          }
        }
      }
    }
  }
}

上面聚合后的结果实际上是下面的

  "aggregations" : {
    "by_product" : {
      ...
      "buckets" : [
        {
          ...
          "key" : "prod-2",
          "all_meta" : {
            "hits" : {
              ...
              "hits" : [
                {
                  ....
                  "_source" : {
                    "meta" : [
                      {
                        "value" : "value1",
                        "key" : "key1"
                      }
                    ]
                  }
                },
                {
                  ....
                  "_source" : {
                    "meta" : [
                      {
                        "value" : "value2",
                        "key" : "key2"
                      }
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          ....
          "key" : "prod-1",
          "all_meta" : {
            "hits" : {
              ....
              "hits" : [
                {
                  ....
                  "_source" : {
                    "meta" : [
                      {
                        "value" : "value1",
                        "key" : "key1"
                      },
                      {
                        "value" : "value2",
                        "key" : "key2"
                      }
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          ....
          "key" : "prod-3",
          "all_meta" : {
            "hits" : {
              ....
              "hits" : [
                {
                  ....
                  "_source" : {
                    "meta" : [
                      {
                        "value" : "value2",
                        "key" : "key2"
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }

现在,我想从上述聚合中过滤值,并且仅当每个存储桶在元中同时具有 { "key": "key1", "value": "value1" }{ "key": "key2", "value": "value2" } 时才获取存储桶。像这样

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "buckets.all_meta.hits.hits._source.meta",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "buckets.all_meta.hits.hits._source.meta.key": "key1"
                    }
                  },
                  {
                    "match": {
                      "buckets.all_meta.hits.hits._source.meta.value": "value1"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "buckets.all_meta.hits.hits._source.meta",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "buckets.all_meta.hits.hits._source.meta.key": "key2"
                    }
                  },
                  {
                    "match": {
                      "buckets.all_meta.hits.hits._source.meta.value": "value2"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

但我不确定如何执行上述步骤。是否有可能做到这一点? This Whosebug 问题类似,但没有任何答案。有没有其他方法可以得到我想要的结果?任何帮助,将不胜感激。谢谢

这是一个解决方案。这个想法是,在每个产品桶中,我们聚合所有 key/value 对(使用脚本化 terms 聚合),然后使用 bucket_selector 管道聚合,我们只 select具有两个不同对的产品桶。

POST products/_search
{
  "size": 0,
  "aggs": {
    "by_product": {
      "terms": {
        "field": "product_name.keyword"
      },
      "aggs": {
        "meta": {
          "nested": {
            "path": "meta"
          },
          "aggs": {
            "kv": {
              "terms": {
                "script": """
                [doc['meta.key.keyword'].value, doc['meta.value.keyword'].value].join('-')
                """,
                "size": 10
              }
            }
          }
        },
        "selector": {
          "bucket_selector": {
            "buckets_path": {
              "count": "meta>kv._bucket_count"
            },
            "script": "params.count == 2"
          }
        }
      }
    }
  }
}

在结果中,您可以看到我们只有 prod-1 和 prod-2`:

  "buckets" : [
    {
      "key" : "prod-2",
      "doc_count" : 2,
      "meta" : {
        "doc_count" : 2,
        "kv" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "key1-value1",
              "doc_count" : 1
            },
            {
              "key" : "key2-value2",
              "doc_count" : 1
            }
          ]
        }
      }
    },
    {
      "key" : "prod-1",
      "doc_count" : 1,
      "meta" : {
        "doc_count" : 2,
        "kv" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "key1-value1",
              "doc_count" : 1
            },
            {
              "key" : "key2-value2",
              "doc_count" : 1
            }
          ]
        }
      }
    }
  ]