Elasticsearch

Question

Elasticsearch 7.7 我正在使用官方 php 客户端与服务器交互。

My issue was somewhat solved here: https://discuss.elastic.co/t/need-to-return-part-of-a-doc-from-a-search-query-filter-is-parent-child-the-way-to-go/64514/2

However "Types are deprecated in APIs in 7.0+" https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html

这是我的文档：

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "boo"
    },
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    },
    {
      "event_id": "9999",
      "start_date": "2020-08-11 11:30:00",
      "registration_count": "41",
      "description": "test"
    }
  ]
}

注意对象如何可能有一个或多个“事件”

基于事件数据的搜索是最常见的用例。

例如：

查找在中午 12 点之前开始的活动
查找描述为“xyz”的事件
列出开始日期在未来 10 天内的查找活动。

我不想return任何与查询不匹配的事件！

因此，例如 Find events with a description of "xyz" for a given service

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "events.description": "xyz"
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "service_id": 20087
              }
            }
          ]
        }
      }
    }
  }
}

我希望结果如下所示：

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    }
  ]
}

但是，它只是 return 包含所有事件的整个文档。

是否可以只 return 数据的一个子集？也许与聚合？

现在，我们正在对应用程序中的结果集（在本例中为 php）执行一组“额外”过滤，以去除与所需结果不匹配的事件块。
最好让 elastic 直接提供所需的内容，而不是对结果进行额外处理以提取适用的事件。
考虑过重组数据以使其基于“事件”，但随后我会复制数据，因为每个产品也会有父数据。

这曾经在 SQL 中，那里有一个关系，而不是像这样嵌套数据。

Answer 1

可以使用嵌套聚合和过滤器聚合返回嵌套数据的子集

要了解有关这些聚合的更多信息，请参阅这些官方文档：

Filter Aggregation

Nested Aggregation

索引映射：

{
  "mappings": {
    "properties": {
      "offering_id": {
        "type": "integer"
      },
      "account_id": {
        "type": "integer"
      },
      "service_id": {
        "type": "integer"
      },
      "title": {
        "type": "text"
      },
      "slug": {
        "type": "text"
      },
      "summary": {
        "type": "text"
      },
      "header_thumb_path": {
        "type": "keyword"
      },
      "duration": {
        "type": "integer"
      },
      "alter_ids": {
        "type": "integer"
      },
      "premium": {
        "type": "text"
      },
      "featured": {
        "type": "text"
      },
      "events": {
        "type": "nested",
        "properties": {
          "event_id": {
            "type": "integer"
          },
          "registration_count": {
            "type": "integer"
          },
          "description": {
            "type": "text"
          }
        }
      }
    }
  }
}

搜索查询：

{
  "size": 0,
  "aggs": {
    "nested": {
      "nested": {
        "path": "events"
      },
      "aggs": {
        "filter": {
          "filter": {
            "match": { "events.description": "xyz" }
          },
          "aggs": {
            "total": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

搜索结果：

"hits": [
          {
            "_index": "foo21",
            "_type": "_doc",
            "_id": "1",
            "_nested": {
              "field": "events",
              "offset": 1
            },
            "_score": 1.0,
            "_source": {
              "event_id": "9999",
              "start_date": "2020-07-01 14:00:00",
              "registration_count": "22",
              "description": "xyz"
            }
          }
        ]

第二种方法：

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "service_id": "20087"
          }
        },
        {
          "nested": {
            "path": "events",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "events.description": "xyz"
                    }
                  }
                ]
              }
            },
            "inner_hits": {
              
            }
          }
        }
      ]
    }
  }
}

你甚至可以通过这个 SO 答案：

在 ElasticSearch 中返回部分嵌套文档

Elasticsearch - Return 嵌套结果的子集

Elasticsearch - Return a subset of nested results

elasticsearch-dsl

elasticsearch-aggregation