我想将 Elasticsearch 中的重复值合并为一个并使用不同的过滤器查看结果
I would like to combine the duplicate values in Elasticsearch into one and see the results with a different filter
我正在通过 Elastic Search 收集日志。日志收集如下。
ex.
{
"name" : "John"
"team" : "IT"
"startTime" : "21:00"
"result" : "pass"
},
{
"name" : "James"
"team" : "HR"
"startTime" : "21:04"
"result" : "pass"
},
{
"name" : "Paul"
"team" : "IT"
"startTime" : "21:05"
"result" : "pass"
},
{
"name" : "Jackson"
"team" : "Marketing"
"startTime" : "21:30"
"result" : "fail"
},
{
"name" : "John"
"team" : "IT"
"startTime" : "21:41"
"result" : "pass"
},
.....and so on
如果您运行对这些收集的日志进行以下查询,
GET logData/_search
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
}
}
}
}
将公开以下结果。
"aggregations" : {
"Documents_per_team" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "IT",
"doc_count" : 70
},
{
"key" : "Marketing",
"doc_count" : 55
},
{
"key" : "HR",
"doc_count" : 11
}
]
}
}
}
如果文档名称在此结果中重复,我想要的是消除重复。
[原样]
- 如上所示,IT 团队人数为 70
[我想要的结果]
- 如果 John 执行了 50 次,Kate 执行了 10 次,Paul 执行了 10 次,则 IT 团队计数 3 被暴露。 (因为有3个IT组员)
删除重复项后我可以得到逐个团队的结果吗?
谢谢
您有两个选择:
- a cardinality sub-aggregation (straightforward, but approximate 且可扩展性不强,尽管仅在非常 specific/advanced 的情况下)
- 或scripted metric聚合(更慢、更冗长但准确)。
这两种方法都假设 name
每个 team-level 都是唯一的。如果不是,您将需要 adjust accordingly。此外,假设 name
被映射为 keyword
类型,就像 team
一样。如果没有,您需要将它们替换为 your_field.keyword
1。基数
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
},
"aggs": {
"unique_names_per_team": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
2。脚本指标
{
"size": 0,
"aggs": {
"Documents_per_team": {
"scripted_metric": {
"init_script": "state.by_department = [:]; state.dept_vs_name = [:];",
"map_script": """
def dept = doc['team'].value;
def name = doc['name'].value;
def name_already_considered = state.by_department.containsKey(dept) && state.dept_vs_name[dept].containsKey(name);
if (name_already_considered) {
return;
}
if (state.by_department.containsKey(dept)) {
state.by_department[dept] += 1;
} else {
state.by_department[dept] = 1
}
if (!state.dept_vs_name.containsKey(dept)) {
// init new map & set is first member
state.dept_vs_name[dept] = [name:true];
} else if (!state.dept_vs_name[dept].containsKey(name)) {
state.dept_vs_name[dept][name] = true;
}
""",
"combine_script": "return state.by_department",
"reduce_script": "return states"
}
}
}
}
注意:如果您还希望查看基础部门与名称明细,您可以将combine_script
修改为return整个状态,即return state
.
我正在通过 Elastic Search 收集日志。日志收集如下。
ex.
{
"name" : "John"
"team" : "IT"
"startTime" : "21:00"
"result" : "pass"
},
{
"name" : "James"
"team" : "HR"
"startTime" : "21:04"
"result" : "pass"
},
{
"name" : "Paul"
"team" : "IT"
"startTime" : "21:05"
"result" : "pass"
},
{
"name" : "Jackson"
"team" : "Marketing"
"startTime" : "21:30"
"result" : "fail"
},
{
"name" : "John"
"team" : "IT"
"startTime" : "21:41"
"result" : "pass"
},
.....and so on
如果您运行对这些收集的日志进行以下查询,
GET logData/_search
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
}
}
}
}
将公开以下结果。
"aggregations" : {
"Documents_per_team" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "IT",
"doc_count" : 70
},
{
"key" : "Marketing",
"doc_count" : 55
},
{
"key" : "HR",
"doc_count" : 11
}
]
}
}
}
如果文档名称在此结果中重复,我想要的是消除重复。
[原样]
- 如上所示,IT 团队人数为 70
[我想要的结果]
- 如果 John 执行了 50 次,Kate 执行了 10 次,Paul 执行了 10 次,则 IT 团队计数 3 被暴露。 (因为有3个IT组员)
删除重复项后我可以得到逐个团队的结果吗?
谢谢
您有两个选择:
- a cardinality sub-aggregation (straightforward, but approximate 且可扩展性不强,尽管仅在非常 specific/advanced 的情况下)
- 或scripted metric聚合(更慢、更冗长但准确)。
这两种方法都假设 name
每个 team-level 都是唯一的。如果不是,您将需要 adjust accordingly。此外,假设 name
被映射为 keyword
类型,就像 team
一样。如果没有,您需要将它们替换为 your_field.keyword
1。基数
{
"size": 0,
"aggs": {
"Documents_per_team": {
"terms": {
"field": "team"
},
"aggs": {
"unique_names_per_team": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
2。脚本指标
{
"size": 0,
"aggs": {
"Documents_per_team": {
"scripted_metric": {
"init_script": "state.by_department = [:]; state.dept_vs_name = [:];",
"map_script": """
def dept = doc['team'].value;
def name = doc['name'].value;
def name_already_considered = state.by_department.containsKey(dept) && state.dept_vs_name[dept].containsKey(name);
if (name_already_considered) {
return;
}
if (state.by_department.containsKey(dept)) {
state.by_department[dept] += 1;
} else {
state.by_department[dept] = 1
}
if (!state.dept_vs_name.containsKey(dept)) {
// init new map & set is first member
state.dept_vs_name[dept] = [name:true];
} else if (!state.dept_vs_name[dept].containsKey(name)) {
state.dept_vs_name[dept][name] = true;
}
""",
"combine_script": "return state.by_department",
"reduce_script": "return states"
}
}
}
}
注意:如果您还希望查看基础部门与名称明细,您可以将combine_script
修改为return整个状态,即return state
.