Elasticsearch - 根据字段的唯一值获取聚合数据
Elasticsearch - getting aggregated data based on unique values from field
在我的 elasticsearch (7.13) 索引中,我有以下数据集:
maid site_id date hour
m1 1300 2021-06-03 1
m1 1300 2021-06-03 2
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
我正在尝试从上述 table 中获取每个日期和 site_id 的唯一记录数。期望的结果是
maid site_id date count
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
我每个人都有数百万的女仆site_id而且日期跨越两年。我在女仆身上使用以下代码 cardinality
假设它将 return 独特的女仆。
GET /r_2332/_search
{
"size":0,
"aggs": {
"site_id": {
"terms": {
"field": "site_id",
"size":100,
"include": [
1171, 1048
]
},"aggs" : {
"bydate" : {
"range" : {
"field": "date","ranges" : [
{
"from": "2021-04-08",
"to": "2021-04-22"
}
]
},"aggs" : {
"rdate" : {
"terms" : {
"field":"date"
},"aggs" :{
"maids" : {
"cardinality": {
"field": "maid"
}
}
}
}
}
}
}
}
}
}
这仍然是 return 具有所有重复值的数据。如何将女佣字段包含到我的查询中,在查询中我根据唯一的女佣值过滤数据。
如果你想获得基于site_id
和maid
的唯一文档,你可以使用multi terms aggregation along with cardinality aggregation
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
"1300",
"1301"
]
}
},
{
"range": {
"date": {
"gte": "2021-06-02",
"lte": "2021-06-03"
}
}
}
]
}
},
"aggs": {
"group_by": {
"multi_terms": {
"terms": [
{
"field": "site_id"
},
{
"field": "maid.keyword"
}
]
},
"aggs": {
"type_count": {
"cardinality": {
"field": "site_id"
}
}
}
}
}
}
搜索结果将是
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
1300,
"m1"
],
"key_as_string": "1300|m1",
"doc_count": 3,
"type_count": {
"value": 1 // note this
}
},
{
"key": [
1300,
"m2"
],
"key_as_string": "1300|m2",
"doc_count": 1,
"type_count": {
"value": 1 // note this
}
}
]
}
在我的 elasticsearch (7.13) 索引中,我有以下数据集:
maid site_id date hour
m1 1300 2021-06-03 1
m1 1300 2021-06-03 2
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
我正在尝试从上述 table 中获取每个日期和 site_id 的唯一记录数。期望的结果是
maid site_id date count
m1 1300 2021-06-03 1
m2 1300 2021-06-03 1
我每个人都有数百万的女仆site_id而且日期跨越两年。我在女仆身上使用以下代码 cardinality
假设它将 return 独特的女仆。
GET /r_2332/_search
{
"size":0,
"aggs": {
"site_id": {
"terms": {
"field": "site_id",
"size":100,
"include": [
1171, 1048
]
},"aggs" : {
"bydate" : {
"range" : {
"field": "date","ranges" : [
{
"from": "2021-04-08",
"to": "2021-04-22"
}
]
},"aggs" : {
"rdate" : {
"terms" : {
"field":"date"
},"aggs" :{
"maids" : {
"cardinality": {
"field": "maid"
}
}
}
}
}
}
}
}
}
}
这仍然是 return 具有所有重复值的数据。如何将女佣字段包含到我的查询中,在查询中我根据唯一的女佣值过滤数据。
如果你想获得基于site_id
和maid
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"terms": {
"site_id": [
"1300",
"1301"
]
}
},
{
"range": {
"date": {
"gte": "2021-06-02",
"lte": "2021-06-03"
}
}
}
]
}
},
"aggs": {
"group_by": {
"multi_terms": {
"terms": [
{
"field": "site_id"
},
{
"field": "maid.keyword"
}
]
},
"aggs": {
"type_count": {
"cardinality": {
"field": "site_id"
}
}
}
}
}
}
搜索结果将是
"aggregations": {
"group_by": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": [
1300,
"m1"
],
"key_as_string": "1300|m1",
"doc_count": 3,
"type_count": {
"value": 1 // note this
}
},
{
"key": [
1300,
"m2"
],
"key_as_string": "1300|m2",
"doc_count": 1,
"type_count": {
"value": 1 // note this
}
}
]
}