Group By 的 Elasticsearch 聚合然后获取最大日期的字段平均值
Elasticsearch aggregation for Group By then get Avg of field for Max date
我正在尝试在 Elasticsearch 中构建一个查询,它将执行以下操作:
a) 按字段分组(即department_name
)
b) 获取最大日期的文档(即record_date
)
c) 计算剩余文档字段的平均值(即risk_index_value
).
如果我的描述没有那么有用,我已经设法构建了下面的查询:
{
"size":0,
"query" : {
"match": {
"record_date": "2021-04-08"
}
},
"aggs":{
"assets":{
"terms":{
"field":"department_name",
"size":10000
},
"aggs":{
"risk_avg":{
"avg":{
"field":"risk_index_value"
}
}
}
}
}
}
此查询在业务逻辑方面完全符合我的要求,但我需要以某种方式始终获取最大日期而不为其提供值。有没有办法做到这一点?我需要使用 REST 高级弹性客户端来执行此操作,但即使是原始查询也会非常有用。提前致谢!
编辑:我将添加一些文档示例,以便我的请求更有意义。
假设我们有 11 个文档:
department_name: A
risk_index_value: 10
record_date: 2021-04-28
department_name: A
risk_index_value: 30
record_date: 2021-04-28
department_name: A
risk_index_value: 20
record_date: 2021-04-28
department_name: A
risk_index_value: 100
record_date: 2021-04-20
department_name: A
risk_index_value: 80
record_date: 2021-04-20
department_name: B
risk_index_value: 240
record_date: 2021-04-28
department_name: B
risk_index_value: 220
record_date: 2021-04-28
department_name: B
risk_index_value: 200
record_date: 2021-04-28
department_name: B
risk_index_value: 100
record_date: 2021-04-20
department_name: B
risk_index_value: 90
record_date: 2021-04-20
department_name: C
risk_index_value: 45
record_date: 2021-04-28
所以在下面的数据中,我需要的查询 return 类似于:
department: A
risk_index_avg: 30
record_date: 2021-04-28
department: B
risk_index_avg: 220
record_date: 2021-04-28
department: C
risk_index_avg: 45
record_date: 2021-04-28
希望对您有所帮助。
根据你的问题我了解到,你想要的是每个部门最新记录日期的平均风险指数。
可以使用术语聚合找到最大值。即;
- 在必填字段上使用术语聚合
- 按降序排列术语键
"order": { "_key": "desc" }
- 说
size = 1
只得到一个最高值。 (这将是最大值)
"aggs": {
"maxKey": {
"terms": {
"field": "<field whose max is required>",
"size": 1,
"order": {
"_key": "desc"
}
}
}
}
我想,下面是您要查找的查询。
{
"size": 0,
"aggs": {
"EachDepartment": {
"terms": {
"field": "department_name",
"size": 1000
},
"aggs": {
"MaxRecordDate": {
"terms": {
"field": "record_date",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"AvgOfRiskIndex": {
"avg": {
"field": "risk_index_value"
}
}
}
}
}
}
}
}
我尝试使用您提供的示例数据执行此操作并得到以下响应。
{
"aggregations" : {
"EachDepartment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 5,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 3,
"AvgOfRiskIndex" : {
"value" : 20.0
}
}
]
}
},
{
"key" : "B",
"doc_count" : 5,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 3,
"AvgOfRiskIndex" : {
"value" : 220.0
}
}
]
}
},
{
"key" : "C",
"doc_count" : 1,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 1,
"AvgOfRiskIndex" : {
"value" : 45.0
}
}
]
}
}
]
}
}
}
我希望这能回答你的问题。
编辑:添加了 RestHighLevelClient 代码以创建聚合
AggregationBuilder getAggsBuilder() {
AggregationBuilder departmentAggs = AggregationBuilders.terms("eachDepartments")
.field("department_name")
.size(1000);
AggregationBuilder maxRecordDateAgg = AggregationBuilders.terms("maxRecordDate")
.field("record_date")
.size(1)
.order(BucketOrder.key(false));
AggregationBuilder avgRiskIndexAgg = AggregationBuilders.avg("avgRiskIndex")
.field("risk_index_value");
// add avgRiskIndexAgg to maxRecordDate
maxRecordDateAgg.subAggregation(avgRiskIndexAgg);
//add maxRecordDate to departmentAggs
departmentAggs.subAggregation(maxRecordDateAgg);
return departmentAggs;
}
我正在尝试在 Elasticsearch 中构建一个查询,它将执行以下操作:
a) 按字段分组(即department_name
)
b) 获取最大日期的文档(即record_date
)
c) 计算剩余文档字段的平均值(即risk_index_value
).
如果我的描述没有那么有用,我已经设法构建了下面的查询:
{
"size":0,
"query" : {
"match": {
"record_date": "2021-04-08"
}
},
"aggs":{
"assets":{
"terms":{
"field":"department_name",
"size":10000
},
"aggs":{
"risk_avg":{
"avg":{
"field":"risk_index_value"
}
}
}
}
}
}
此查询在业务逻辑方面完全符合我的要求,但我需要以某种方式始终获取最大日期而不为其提供值。有没有办法做到这一点?我需要使用 REST 高级弹性客户端来执行此操作,但即使是原始查询也会非常有用。提前致谢!
编辑:我将添加一些文档示例,以便我的请求更有意义。
假设我们有 11 个文档:
department_name: A
risk_index_value: 10
record_date: 2021-04-28
department_name: A
risk_index_value: 30
record_date: 2021-04-28
department_name: A
risk_index_value: 20
record_date: 2021-04-28
department_name: A
risk_index_value: 100
record_date: 2021-04-20
department_name: A
risk_index_value: 80
record_date: 2021-04-20
department_name: B
risk_index_value: 240
record_date: 2021-04-28
department_name: B
risk_index_value: 220
record_date: 2021-04-28
department_name: B
risk_index_value: 200
record_date: 2021-04-28
department_name: B
risk_index_value: 100
record_date: 2021-04-20
department_name: B
risk_index_value: 90
record_date: 2021-04-20
department_name: C
risk_index_value: 45
record_date: 2021-04-28
所以在下面的数据中,我需要的查询 return 类似于:
department: A
risk_index_avg: 30
record_date: 2021-04-28
department: B
risk_index_avg: 220
record_date: 2021-04-28
department: C
risk_index_avg: 45
record_date: 2021-04-28
希望对您有所帮助。
根据你的问题我了解到,你想要的是每个部门最新记录日期的平均风险指数。
可以使用术语聚合找到最大值。即;
- 在必填字段上使用术语聚合
- 按降序排列术语键
"order": { "_key": "desc" }
- 说
size = 1
只得到一个最高值。 (这将是最大值)
"aggs": {
"maxKey": {
"terms": {
"field": "<field whose max is required>",
"size": 1,
"order": {
"_key": "desc"
}
}
}
}
我想,下面是您要查找的查询。
{
"size": 0,
"aggs": {
"EachDepartment": {
"terms": {
"field": "department_name",
"size": 1000
},
"aggs": {
"MaxRecordDate": {
"terms": {
"field": "record_date",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"AvgOfRiskIndex": {
"avg": {
"field": "risk_index_value"
}
}
}
}
}
}
}
}
我尝试使用您提供的示例数据执行此操作并得到以下响应。
{
"aggregations" : {
"EachDepartment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "A",
"doc_count" : 5,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 3,
"AvgOfRiskIndex" : {
"value" : 20.0
}
}
]
}
},
{
"key" : "B",
"doc_count" : 5,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 2,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 3,
"AvgOfRiskIndex" : {
"value" : 220.0
}
}
]
}
},
{
"key" : "C",
"doc_count" : 1,
"MaxRecordDate" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1619568000000,
"key_as_string" : "2021-04-28 00:00:00",
"doc_count" : 1,
"AvgOfRiskIndex" : {
"value" : 45.0
}
}
]
}
}
]
}
}
}
我希望这能回答你的问题。
编辑:添加了 RestHighLevelClient 代码以创建聚合
AggregationBuilder getAggsBuilder() {
AggregationBuilder departmentAggs = AggregationBuilders.terms("eachDepartments")
.field("department_name")
.size(1000);
AggregationBuilder maxRecordDateAgg = AggregationBuilders.terms("maxRecordDate")
.field("record_date")
.size(1)
.order(BucketOrder.key(false));
AggregationBuilder avgRiskIndexAgg = AggregationBuilders.avg("avgRiskIndex")
.field("risk_index_value");
// add avgRiskIndexAgg to maxRecordDate
maxRecordDateAgg.subAggregation(avgRiskIndexAgg);
//add maxRecordDate to departmentAggs
departmentAggs.subAggregation(maxRecordDateAgg);
return departmentAggs;
}