使用 Elasticsearch 的独特字段组合的最新结果

Latest result for a unique combination of fields using Elasticsearch

我有以下格式的文档:

{name: 'A', website: 'example.com', date: 1, + other fields}
{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 1, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 1, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}

我想对 namewebsite 进行多查询,同时只返回最新的结果。我的查询如下所示:

query: {
    bool: {
      ...optional filters...,
      must: {
        multi_match: {
          query: input,
          type: "most_fields",
          fields: ["name^3", ..., "website"],
        },
      },
    },
  },

我想要的输出应该是这样的,排序方式 _score:

{name: 'A', website: 'example.com', date: 2, + other fields}
{name: 'B', website: 'example.com', date: 2, + other fields}
{name: 'A', website: 'something.com', date: 2, + other fields}
{name: 'C', website: 'something.com', date: 2, + other fields}

现在我明白了需要 agg 才能使用 top_hits 获得最新结果,例如:

top_hits: {
  size: 1,
  sort: [{ date: "desc" }],
},

但是,在按 website 然后按 name 进行聚合的过程中,我丢失了 _score 的排序,这对我的查询很重要。我已经尝试过使用 composite agg,怎么可能无法按结果记录的分数对其进行排序。

我正在考虑使用额外的手动创建一个字段,该字段是 namewebsite 的串联,然后我可以将其用作单级聚合,然后允许我对键进行排序通过 _score。例如:

  aggs: {
    latest_results: {
      terms: {
        field: "website_name.keyword",
        order: {
          maximum_score: "desc",
        },
      },
      aggs: {
        maximum_score: {
          max: {
            script: {
              source: "_score",
            },
          },
        },
        hits: {
          top_hits: {
            size: 1,
            sort: [{ date: "desc" }],
          },
        },
      },
    },
  },

您应该能够使用脚本对术语聚合中的热门点击聚合进行此操作。 根据 top_hits

的文档

sort - 应如何对最匹配的命中进行排序。默认情况下,命中按主查询的分数排序。

{
  "size": 0, 
  "query": {
    "bool": {
      "must": [
        {"multi_match": {
          "query": "A",
          "type": "most_fields",
          "fields": ["name^3", "website"]
        }}
      ]
    }
  },
  "aggs": {
    "visitor": {
      "terms": {
       "script": "doc['name'].value +'-'+ doc['website'].value",
        "size": 10
      },
      "aggs": {
        "top_visitors": {
          "top_hits": {
            "size": 1
            
          }
        }
      }
    }
  }
}

您的结果将如下所示:

"visitor" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "A-example.com",
      "doc_count" : 2,
      "top_visitors" : {
        "hits" : {
          "total" : {
            "value" : 2,
            "relation" : "eq"
          },
          "max_score" : 1.7260926,
          "hits" : [
            {
              "_index" : "test-52",
              "_type" : "_doc",
              "_id" : "vu_xUnQB5HlCKIdlWRy8",
              "_score" : 1.7260926,
              "_source" : {
                "name" : "A",
                "website" : "example.com",
                "date" : 1
              }
            }
          ]
        }
      }
    },
    {
      "key" : "A-something.com",
      "doc_count" : 2,
      "top_visitors" : {
        "hits" : {
          "total" : {
            "value" : 2,
            "relation" : "eq"
          },
          "max_score" : 1.7260926,
          "hits" : [
            {
              "_index" : "test-52",
              "_type" : "_doc",
              "_id" : "VWDxUnQBx_BqvGcp8U8j",
              "_score" : 1.7260926,
              "_source" : {
                "name" : "A",
                "website" : "something.com",
                "date" : 1
              }
            }
          ]
        }
      }
    }
  ]
}

小心使用脚本进行性能聚合会占用大量资源并且速度很慢。