查询以返回 elasticsearch 中的唯一字段集
Query to rerturn unique set of fields in elasticsearch
我将警报记录存储在 Elasticsearch 5.6 索引中。执行 _search?q=*
后,我得到的数据如下所示:
"hits": [
{
"_index": "alerts",
"_type": "alert-mapping",
"_id": "AWG0lW0jxQ7bOrwfOzFI",
"_score": 1,
"_source": {
"events": [
{
"name": "walking",
}
],
"categoryID": "easy",
"comments": "this is a comment",
"active": true
}
},
{
"_index": "alerts",
"_type": "alert-mapping",
"_id": "AWds3wd43980wfOzFI",
"_score": 1,
"_source": {
"events": [
{
"name": "running",
}
],
"categoryID": "difficult",
"comments": "this is another comment",
"active": false
}
}]
根据数据规范,事件数组将永远只有一个值。这可能会在未来更新,但我现在可以在这个假设下操作。我想要做的是创建一个查询,该查询将获取所有唯一的 events.name
值及其对应的 categoryID
。
我有一个我认为可行的示例查询,但它返回了所有唯一的 events.name
值以及所有唯一的 categoryID
值。我当前的查询如下所示
GET alerts/_search
{
"size":0,
"aggs":{
"alerts":{
"terms":{
"field":"events.name",
"size":1
}
},
"categories":{
"terms":{
"field":"categoryID"
}
}
}
}
这将 return 看起来像这样的东西
"aggregations": {
"alerts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "running",
"doc_count": 225
},
{
"key": "walking",
"doc_count": 219
}
]
},
"categroies": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "easy",
"doc_count": 363
},
{
"key": "difficult",
"doc_count": 352
}
]
}
}
我真正想要的是 events.name
和 categoryID
在 returned 结果中组合在一起的东西,所以我得到所有 events.name
及其相应的结果categoryID
。看起来像这样的东西
"aggregations": {
"alerts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "running",
"categories": "difficult",
"doc_count": 225
},
{
"key": "walking",
"categories": "easy",
"doc_count": 219
}
]
}
你可以像这样一个嵌套另一个:
{
"size": 0,
"aggs": {
"alerts": {
"terms": {
"field": "events.name",
"size": 1
},
"aggs": {
"categories": {
"terms": {
"field": "categoryID"
}
}
}
}
}
}
它不会完全是您想要的结构,但它会为您提供嵌套的每个事件名称的所有唯一类别 ID。我想不出一种方法来实现您想要的输出。
如果您可以将 "events" 字段的映射更改为类型嵌套,那么您可以使用反向嵌套聚合来接近您想要的内容。
POST /alerts/_search
{
"query":{
"match_all": {}
},
"aggs":{
"events_name": {
"nested": {
"path": "events"
},
"aggs":{
"events":{
"terms": {
"field": "events.name"
},
"aggs":{
"category_ids":{
"reverse_nested":{},
"aggs":{
"cat_ids_per_event":{
"terms": {
"field": "categoryID"
}
}
}
}
}
}
}
}
}
}
通过虚拟文档让我得到这个
"aggregations": {
"events_name": {
"doc_count": 9,
"events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "walking",
"doc_count": 5,
"category_ids": {
"doc_count": 5,
"cat_ids_per_event": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "easy",
"doc_count": 5
}
]
}
}
},
{
"key": "running",
"doc_count": 4,
"category_ids": {
"doc_count": 4,
"cat_ids_per_event": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "difficult",
"doc_count": 4
}
]
}
}
}
]
}
}
}
我将警报记录存储在 Elasticsearch 5.6 索引中。执行 _search?q=*
后,我得到的数据如下所示:
"hits": [
{
"_index": "alerts",
"_type": "alert-mapping",
"_id": "AWG0lW0jxQ7bOrwfOzFI",
"_score": 1,
"_source": {
"events": [
{
"name": "walking",
}
],
"categoryID": "easy",
"comments": "this is a comment",
"active": true
}
},
{
"_index": "alerts",
"_type": "alert-mapping",
"_id": "AWds3wd43980wfOzFI",
"_score": 1,
"_source": {
"events": [
{
"name": "running",
}
],
"categoryID": "difficult",
"comments": "this is another comment",
"active": false
}
}]
根据数据规范,事件数组将永远只有一个值。这可能会在未来更新,但我现在可以在这个假设下操作。我想要做的是创建一个查询,该查询将获取所有唯一的 events.name
值及其对应的 categoryID
。
我有一个我认为可行的示例查询,但它返回了所有唯一的 events.name
值以及所有唯一的 categoryID
值。我当前的查询如下所示
GET alerts/_search
{
"size":0,
"aggs":{
"alerts":{
"terms":{
"field":"events.name",
"size":1
}
},
"categories":{
"terms":{
"field":"categoryID"
}
}
}
}
这将 return 看起来像这样的东西
"aggregations": {
"alerts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "running",
"doc_count": 225
},
{
"key": "walking",
"doc_count": 219
}
]
},
"categroies": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "easy",
"doc_count": 363
},
{
"key": "difficult",
"doc_count": 352
}
]
}
}
我真正想要的是 events.name
和 categoryID
在 returned 结果中组合在一起的东西,所以我得到所有 events.name
及其相应的结果categoryID
。看起来像这样的东西
"aggregations": {
"alerts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "running",
"categories": "difficult",
"doc_count": 225
},
{
"key": "walking",
"categories": "easy",
"doc_count": 219
}
]
}
你可以像这样一个嵌套另一个:
{
"size": 0,
"aggs": {
"alerts": {
"terms": {
"field": "events.name",
"size": 1
},
"aggs": {
"categories": {
"terms": {
"field": "categoryID"
}
}
}
}
}
}
它不会完全是您想要的结构,但它会为您提供嵌套的每个事件名称的所有唯一类别 ID。我想不出一种方法来实现您想要的输出。
如果您可以将 "events" 字段的映射更改为类型嵌套,那么您可以使用反向嵌套聚合来接近您想要的内容。
POST /alerts/_search
{
"query":{
"match_all": {}
},
"aggs":{
"events_name": {
"nested": {
"path": "events"
},
"aggs":{
"events":{
"terms": {
"field": "events.name"
},
"aggs":{
"category_ids":{
"reverse_nested":{},
"aggs":{
"cat_ids_per_event":{
"terms": {
"field": "categoryID"
}
}
}
}
}
}
}
}
}
}
通过虚拟文档让我得到这个
"aggregations": {
"events_name": {
"doc_count": 9,
"events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "walking",
"doc_count": 5,
"category_ids": {
"doc_count": 5,
"cat_ids_per_event": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "easy",
"doc_count": 5
}
]
}
}
},
{
"key": "running",
"doc_count": 4,
"category_ids": {
"doc_count": 4,
"cat_ids_per_event": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "difficult",
"doc_count": 4
}
]
}
}
}
]
}
}
}