如何计算elastic.js(elasticsearch)中具有相同值的字段?

How to count of fields with the same value in elastic.js (elasticsearch)?

我有一个社区列表。我需要创建聚合查询来计算所有具有相同标题的数据。

[
  {
    "_id": "56161cb3cbdad2e3b437fdc3",
    "_type": "Comunity",
    "name": "public",
    "data": [
      {
        "title": "sonder",
        "creationDate": "2015-08-22T03:43:28 -03:00",
        "quantity": 0
      },
      {
        "title": "vule",
        "creationDate": "2014-05-17T12:35:01 -03:00",
        "quantity": 0
      },
      {
        "title": "omer",
        "creationDate": "2015-01-31T04:54:19 -02:00",
        "quantity": 3
      },
      {
        "title": "sonder",
        "creationDate": "2014-05-22T05:09:36 -03:00",
        "quantity": 3
      }
    ]
  },
  {
    "_id": "56161cb3dae30517fc133cd9",
    "_type": "Comunity",
    "name": "static",
    "data": [
      {
        "title": "vule",
        "creationDate": "2014-07-01T06:32:06 -03:00",
        "quantity": 5
      },
      {
        "title": "vule",
        "creationDate": "2014-01-10T12:40:28 -02:00",
        "quantity": 1
      },
      {
        "title": "vule",
        "creationDate": "2014-01-09T09:33:11 -02:00",
        "quantity": 3
      }
    ]
  },
  {
    "_id": "56161cb32f62b522355ca3c8",
    "_type": "Comunity",
    "name": "public",
    "data": [
      {
        "title": "vule",
        "creationDate": "2014-02-03T09:55:28 -02:00",
        "quantity": 2
      },
      {
        "title": "vule",
        "creationDate": "2015-01-23T09:14:22 -02:00",
        "quantity": 0
      }
    ]
  }
]

所以希望的结果应该是

[
  {
    title: vule,
    total: 6
  },
  {
    title: omer,
    total: 1
  },
  {
    title: sonder,
    total: 1
  }
]

我写了一些聚合查询,但还是不行。我怎样才能得到想要的结果?

PS:我尝试创建嵌套聚合

ejs.Request().size(0).agg(
        ejs.NestedAggregation('comunities')
            .path('data')
            .agg(
                ejs.FilterAggregation('sonder')
                    .filter(
                    ejs.TermsFilter('data.title', 'sonder')
                ).agg(
                ejs.ValueCountAggregation('counts')
                      .field('data.title')
)
            )
    );

您需要使用 terms 聚合。

现在,根据您的映射,可能有两种方法:

1.您的数据字段存储为子文档

您需要 运行 一个简单的术语聚合,在 RAW json 中看起来像:

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Grouping": {
      "terms": {
        "field": "data.title",
        "size": 0
      }
    }
  }
}

2。您的数据字段存储为 nested document

您必须在进行术语聚合之前添加嵌套子聚合。

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Nest": {
      "nested": {
        "path": "data"
      },
      "aggs": {
        "Grouping": {
          "terms": {
            "field": "data.title",
            "size": 0
          }
        }
      }
    }
  }
}

两者都会输出:

{
   "took": 125,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "Nest": {
         "doc_count": 9,
         "Grouping": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "vule",
                  "doc_count": 6            -- The Total count you're looking for
               },
               {
                  "key": "sonder",
                  "doc_count": 2
               },
               {
                  "key": "omer",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

不幸的是,这只是一个原始查询,但我想它可以很容易地转换成 elastic.js

最重要的是。如果您要进行聚合,请不要忘记设置您正在进行聚合的字段,如 not_analyzed,因为它将开始计算单个标记,如 documentation

我自己会将这些文档存储为嵌套文档。

示例:

映射:

PUT /test
{
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string"
        },
        "data": {
          "type": "nested",
          "properties": {
            "title": {
              "type": "string",
              "index": "not_analyzed",
              "fields": {
                "stemmed": {
                  "type": "string",
                  "analyzed": "standard"
                }
              }
            },
            "creationDate": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "quantity": {
              "type": "integer"
            }
          }
        }
      }
    }
  }
}

测试数据:

PUT /test/test/56161cb3cbdad2e3b437fdc3
{
  "name": "public",
  "data": [
    {
      "title": "sonder",
      "creationDate": "2015-08-22T03:43:28",
      "quantity": 0
    },
    {
      "title": "vule",
      "creationDate": "2014-05-17T12:35:01",
      "quantity": 0
    },
    {
      "title": "omer",
      "creationDate": "2015-01-31T04:54:19",
      "quantity": 3
    },
    {
      "title": "sonder",
      "creationDate": "2014-05-22T05:09:36",
      "quantity": 3
    }
  ]
}

PUT /test/test/56161cb3dae30517fc133cd9
{
  "name": "static",
  "data": [
    {
      "title": "vule",
      "creationDate": "2014-07-01T06:32:06",
      "quantity": 5
    },
    {
      "title": "vule",
      "creationDate": "2014-01-10T12:40:28",
      "quantity": 1
    },
    {
      "title": "vule",
      "creationDate": "2014-01-09T09:33:11",
      "quantity": 3
    }
  ]
}

PUT /test/test/56161cb32f62b522355ca3c8
{
  "name": "public",
  "data": [
    {
      "title": "vule",
      "creationDate": "2014-02-03T09:55:28",
      "quantity": 2
    },
    {
      "title": "vule",
      "creationDate": "2015-01-23T09:14:22",
      "quantity": 0
    }
  ]
}

实际查询:

POST /test/test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Nest": {
      "nested": {
        "path": "data"
      },
      "aggs": {
        "Grouping": {
          "terms": {
            "field": "data.title",
            "size": 0
          }
        }
      }
    }
  }
}

P.S。 "size":0 意味着我让 Elasticsearch 输出所有可能的术语,而不是像 documentation.

中描述的那样限制它的输出

The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned). If set to 0, the size will be set to Integer.MAX_VALUE.