ElasticSearch - 过滤、分组并计算每个组的结果
ElasticSearch - filter, group by and count results for each group
我是 ElasticSearch 的新手,需要帮助解决以下问题:
我有一组包含多个产品的文档。我想通过 "Apple" 过滤产品-属性 product_brand
并获得符合过滤器的产品数量。然而,结果应该按文档 ID 分组,文档 ID 也是文档本身的一部分 (test_id
)。
示例文档:
"test" : {
"test_id" : 19988,
"test_name" : "Test",
},
"products" : [
{
"product_id" : 1,
"product_brand" : "Apple"
},
{
"product_id" : 2,
"product_brand" : "Apple"
},
{
"product_id" : 3,
"product_brand" : "Samsung"
}
]
结果应该是:
{
"key" : 19988,
"count" : 2
},
在 SQL 中,它看起来大致像这样:
SELECT test_id, COUNT(product_id)
FROM `test`
WHERE product_brand = 'Apple'
GROUP BY test_id;
我怎样才能做到这一点?
我认为这应该让你很接近:
GET /test/_search
{
"_source": {
"includes": [
"test.test_id",
"_score"
]
},
"query": {
"function_score": {
"query": {
"match": {
"products.product_brand.keyword": "Apple"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "def matches=0; def products = params['_source']['products']; for(p in products){if(p.product_brand == params['brand']){matches++;}} return matches;",
"params": {
"brand": "Apple"
}
}
}
}
]
}
}
}
此方法使用 function_score,但如果您想以不同的方式得分,也可以将其应用于脚本字段。以上仅匹配具有品牌文本完全设置为 "Apple".
的子产品对象的文档
你只需要控制apple的两个实例的输入即可。或者,您可以匹配 function_score 查询中的所有内容并只关注分数。您的输出可能如下所示:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "AV99vrBpgkgblFY6zscA",
"_score": 2,
"_source": {
"test": {
"test_id": 19988
}
}
}
]
}
}
我使用的索引中的映射如下所示:
{
"test": {
"mappings": {
"doc": {
"properties": {
"products": {
"properties": {
"product_brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
}
}
},
"test": {
"properties": {
"test_id": {
"type": "long"
},
"test_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
我是 ElasticSearch 的新手,需要帮助解决以下问题:
我有一组包含多个产品的文档。我想通过 "Apple" 过滤产品-属性 product_brand
并获得符合过滤器的产品数量。然而,结果应该按文档 ID 分组,文档 ID 也是文档本身的一部分 (test_id
)。
示例文档:
"test" : {
"test_id" : 19988,
"test_name" : "Test",
},
"products" : [
{
"product_id" : 1,
"product_brand" : "Apple"
},
{
"product_id" : 2,
"product_brand" : "Apple"
},
{
"product_id" : 3,
"product_brand" : "Samsung"
}
]
结果应该是:
{
"key" : 19988,
"count" : 2
},
在 SQL 中,它看起来大致像这样:
SELECT test_id, COUNT(product_id)
FROM `test`
WHERE product_brand = 'Apple'
GROUP BY test_id;
我怎样才能做到这一点?
我认为这应该让你很接近:
GET /test/_search
{
"_source": {
"includes": [
"test.test_id",
"_score"
]
},
"query": {
"function_score": {
"query": {
"match": {
"products.product_brand.keyword": "Apple"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "def matches=0; def products = params['_source']['products']; for(p in products){if(p.product_brand == params['brand']){matches++;}} return matches;",
"params": {
"brand": "Apple"
}
}
}
}
]
}
}
}
此方法使用 function_score,但如果您想以不同的方式得分,也可以将其应用于脚本字段。以上仅匹配具有品牌文本完全设置为 "Apple".
的子产品对象的文档你只需要控制apple的两个实例的输入即可。或者,您可以匹配 function_score 查询中的所有内容并只关注分数。您的输出可能如下所示:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2,
"hits": [
{
"_index": "test",
"_type": "doc",
"_id": "AV99vrBpgkgblFY6zscA",
"_score": 2,
"_source": {
"test": {
"test_id": 19988
}
}
}
]
}
}
我使用的索引中的映射如下所示:
{
"test": {
"mappings": {
"doc": {
"properties": {
"products": {
"properties": {
"product_brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"product_id": {
"type": "long"
}
}
},
"test": {
"properties": {
"test_id": {
"type": "long"
},
"test_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}