bucket 术语聚合 Elasticsearch
bucket Terms aggregation Elasticsearch
elasticsearch版本
{
"name" : "abc-Inspiron-5521",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "2vLvphpURJOtfAZSGDDX5w",
"version" : {
"number" : "7.10.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"build_date" : "2021-01-13T00:42:12.435326Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
文档映射
"user_data" : {
"aliases" : { },
"mappings" : {
"properties" : {
"experience" : {
"properties" : {
"brand" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"brand_segment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"company" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"duration" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"property_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"real_estate_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
文档结构正确,括号不符请相应修改。
文件样本
{
"_index" : "user_data",
"_type" : "_doc",
"_id" : "dONuEXgBU9vYaZRqY8Jo",
"_score" : 1.0,
"_source" : {
"experience" : [
{
"brand" : "Hilton",
"company" : "Hilton LLC",
"brand_segment" : "Luxury",
"property_type" : "All-Inclusive",
"duration" : "2 years",
"real_estate_type" : "Institutional"
},
{
"brand" : "Mantis",
"company" : "Accor LLC",
"brand_segment" : "Upper-Upscale",
"property_type" : "Condo",
"duration" : "2 years",
"real_estate_type" : "Family Office"
},
{
"brand" : "Marriott",
"company" : "Marriott LLC",
"brand_segment" : "Independent",
"property_type" : "Convention",
"duration" : "2 years",
"real_estate_type" : "Family Office"
}
]
}
}
我在 brand_segment
上的术语聚合查询
GET user_data/_search
{
"aggs": {
"experience": {
"terms": { "field": "experience.brand_segment" }
}
}
}
现在我在进行术语聚合时遇到了两个问题
在 'brand_segment' 上执行术语聚合时,值 'Upper-Upscale' 应该被视为单个单位并根据其进行计数,但目前我将其作为:
第二个问题是,如果我想计算 brand_segment 值是 'Luxury' 或任何值的次数,但目前从上面的查询中我正在计算文档数量Luxury 出现的位置,而不是 Luxury 在所有文档中出现的次数。 (截至目前,对于 1 个文档,多次出现被计为一次)。
错误结果
"aggregations" : {
"experience" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "independent",
"doc_count" : 15
},
{
"key" : "luxury",
"doc_count" : 15
},
{
"key" : "upper",
"doc_count" : 14
},
{
"key" : "upscale",
"doc_count" : 14
}
]
}
}
期望的输出应该有 Upper-Upscale 作为一个值。我已经获取了多个示例文档,因此得到了这个结果。
请随意使用此作为创建索引的示例文档
{
"id": 1,
"name": "abcs",
"source": "csv_status",
"profile_complition": "70%",
"creation_date": "2020-04-02",
"current_position": [
{
"position": "Financial Reporting",
"position_category": "Finance",
"position_level": 2
}
],
"seeking_position": [
{
"position": "Financial Planning and Analysis",
"position_category": "Finance",
"position_level": 3
}
],
"last_updation_date": "2021-02-02",
"experience": [
{
"brand": "Hilton",
"company": "Hilton LLC",
"brand_segment": "Luxury",
"property_type": "All-Inclusive",
"duration": "2 years",
"real_estate_type": "Institutional"
},
{
"brand": "Accor",
"company": "Accor LLC",
"brand_segment": "Luxury",
"property_type": "Condo",
"duration": "2 years",
"real_estate_type": "Family Office"
},
{
"brand": "Marriott",
"company": "Marriott LLC",
"brand_segment": "Independent",
"property_type": "Convention",
"duration": "2 years",
"real_estate_type": "Family Office"
}
]
}
brand_segment 中的其他事件 = ['Economy'、'Upscale'、'Midscale'、'Upper-Upscale'、'Luxury'、'Independent' , 'Extended Stay']
PS:所有 brand_segment 都希望被视为单个实体('Upper-Upscale' 不希望 'Upper'、'Upscale'。与 'Extended Stay')
如果需要进一步说明,请告诉我。
对于第一期,您需要在 keyword
子字段上进行聚合:
GET user_data/_search
{
"aggs": {
"experience": {
"terms": { "field": "experience.brand_segment.keyword" }
}
}
}
要解决第二个问题,您需要将 experience
字段设置为 nested,这意味着您的映射需要如下所示:
"user_data" : {
"aliases" : { },
"mappings" : {
"properties" : {
"experience" : {
"type": "nested", <--- add this
"properties" : {
"brand" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
elasticsearch版本
{
"name" : "abc-Inspiron-5521",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "2vLvphpURJOtfAZSGDDX5w",
"version" : {
"number" : "7.10.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"build_date" : "2021-01-13T00:42:12.435326Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
文档映射
"user_data" : {
"aliases" : { },
"mappings" : {
"properties" : {
"experience" : {
"properties" : {
"brand" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"brand_segment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"company" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"duration" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"property_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"real_estate_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
文档结构正确,括号不符请相应修改。
文件样本
{
"_index" : "user_data",
"_type" : "_doc",
"_id" : "dONuEXgBU9vYaZRqY8Jo",
"_score" : 1.0,
"_source" : {
"experience" : [
{
"brand" : "Hilton",
"company" : "Hilton LLC",
"brand_segment" : "Luxury",
"property_type" : "All-Inclusive",
"duration" : "2 years",
"real_estate_type" : "Institutional"
},
{
"brand" : "Mantis",
"company" : "Accor LLC",
"brand_segment" : "Upper-Upscale",
"property_type" : "Condo",
"duration" : "2 years",
"real_estate_type" : "Family Office"
},
{
"brand" : "Marriott",
"company" : "Marriott LLC",
"brand_segment" : "Independent",
"property_type" : "Convention",
"duration" : "2 years",
"real_estate_type" : "Family Office"
}
]
}
}
我在 brand_segment
上的术语聚合查询GET user_data/_search
{
"aggs": {
"experience": {
"terms": { "field": "experience.brand_segment" }
}
}
}
现在我在进行术语聚合时遇到了两个问题
在 'brand_segment' 上执行术语聚合时,值 'Upper-Upscale' 应该被视为单个单位并根据其进行计数,但目前我将其作为:
第二个问题是,如果我想计算 brand_segment 值是 'Luxury' 或任何值的次数,但目前从上面的查询中我正在计算文档数量Luxury 出现的位置,而不是 Luxury 在所有文档中出现的次数。 (截至目前,对于 1 个文档,多次出现被计为一次)。
错误结果
"aggregations" : {
"experience" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "independent",
"doc_count" : 15
},
{
"key" : "luxury",
"doc_count" : 15
},
{
"key" : "upper",
"doc_count" : 14
},
{
"key" : "upscale",
"doc_count" : 14
}
]
}
}
期望的输出应该有 Upper-Upscale 作为一个值。我已经获取了多个示例文档,因此得到了这个结果。
请随意使用此作为创建索引的示例文档
{
"id": 1,
"name": "abcs",
"source": "csv_status",
"profile_complition": "70%",
"creation_date": "2020-04-02",
"current_position": [
{
"position": "Financial Reporting",
"position_category": "Finance",
"position_level": 2
}
],
"seeking_position": [
{
"position": "Financial Planning and Analysis",
"position_category": "Finance",
"position_level": 3
}
],
"last_updation_date": "2021-02-02",
"experience": [
{
"brand": "Hilton",
"company": "Hilton LLC",
"brand_segment": "Luxury",
"property_type": "All-Inclusive",
"duration": "2 years",
"real_estate_type": "Institutional"
},
{
"brand": "Accor",
"company": "Accor LLC",
"brand_segment": "Luxury",
"property_type": "Condo",
"duration": "2 years",
"real_estate_type": "Family Office"
},
{
"brand": "Marriott",
"company": "Marriott LLC",
"brand_segment": "Independent",
"property_type": "Convention",
"duration": "2 years",
"real_estate_type": "Family Office"
}
]
}
brand_segment 中的其他事件 = ['Economy'、'Upscale'、'Midscale'、'Upper-Upscale'、'Luxury'、'Independent' , 'Extended Stay']
PS:所有 brand_segment 都希望被视为单个实体('Upper-Upscale' 不希望 'Upper'、'Upscale'。与 'Extended Stay')
如果需要进一步说明,请告诉我。
对于第一期,您需要在 keyword
子字段上进行聚合:
GET user_data/_search
{
"aggs": {
"experience": {
"terms": { "field": "experience.brand_segment.keyword" }
}
}
}
要解决第二个问题,您需要将 experience
字段设置为 nested,这意味着您的映射需要如下所示:
"user_data" : {
"aliases" : { },
"mappings" : {
"properties" : {
"experience" : {
"type": "nested", <--- add this
"properties" : {
"brand" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},