Elastic/Kibana:在查询搜索中支持复数
Elastic/Kibana: support for plurals in query searches
我会简化我的问题。假设我有一个索引,其中包含我使用 Kibana 创建的 3 个文档:
PUT /test/vendors/1
{
"type": "doctor",
"name": "Phil",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New York"
}
]
}
PUT /test/vendors/2
{
"type": "lawyer",
"name": "John",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New Jersey"
}
]
}
PUT /test/vendors/3
{
"type": "doctor",
"name": "Jill",
"works_in": [
{
"place": "Chicago"
}
]
}
现在我运行搜索:
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in chicago",
"fields": [ "type", "place" ]
}
}
}
我得到了很好的回应:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "vendors",
"_id": "1",
"_score": 0.2876821,
"_source": {
"type": "doctor",
"name": "Phil",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New York"
}
]
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "3",
"_score": 0.2876821,
"_source": {
"type": "doctor",
"name": "Jill",
"works_in": [
{
"place": "Chicago"
}
]
}
}
]
}
}
现在事情开始出现问题了...
将 doctor
更改为 doctors
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctors in chicago",
"fields": [ "type", "place" ]
}
}
}
零个结果,因为 doctors
未找到。 Elastic 不知道复数还是单数。
将查询更改为 New York
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in new york",
"fields": [ "type", "place" ]
}
}
}
但是响应结果集除了New York
中的doctor
之外,还给了我Chicago
中的doctor
。字段匹配 OR...
另一个有趣的问题是,如果有人使用 docs
或 physicians
或 health professionals
但意思是 doctor
会发生什么。是否有规定我可以教 Elasticsearch 将这些内容汇集到 "doctor"?
是否有任何模式可以单独使用 elasticsearch 解决这些问题?我不必在我自己的应用程序中分析字符串的含义,然后构建一个复杂的精确 elasticsearch 查询来匹配它?
我将不胜感激任何指向正确方向的指示
我假设字段 type
和 place
属于 Text type with Standard Analyzers。
要管理 singular/plurals,您要查找的内容称为 Snowball Token Filter,您需要将其添加到映射中。
您提到的另一个要求,例如physicians
也应该等同于doctor
,你需要利用Synonym Token Filter
下面是您的映射应该如何。请注意,我刚刚将分析器添加到 type
。您可以对到其他字段的映射进行类似的更改。
映射
PUT <your_index_name>
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"tokenizer":"standard",
"filter":[
"lowercase",
"my_snow",
"my_synonym"
]
}
},
"filter":{
"my_snow":{
"type":"snowball",
"language":"English"
},
"my_synonym":{
"type":"synonym",
"synonyms":[
"docs, physicians, health professionals, doctor"
]
}
}
}
},
"mappings":{
"mydocs":{
"properties":{
"type":{
"type":"text",
"analyzer":"my_analyzer"
},
"place":{
"type":"text",
"analyzer":"my_analyzer"
}
}
}
}
}
请注意我是如何在映射本身中添加同义词的,而不是我建议您在文本文件中添加同义词,如下所示
{
"type":"synonym",
"synonyms_path" : "analysis/synonym.txt"
}
根据我分享的link,上面提到上面配置了一个同义词过滤器,路径为analysis/synonym.txt(相对于配置位置).
希望对您有所帮助!
我会简化我的问题。假设我有一个索引,其中包含我使用 Kibana 创建的 3 个文档:
PUT /test/vendors/1
{
"type": "doctor",
"name": "Phil",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New York"
}
]
}
PUT /test/vendors/2
{
"type": "lawyer",
"name": "John",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New Jersey"
}
]
}
PUT /test/vendors/3
{
"type": "doctor",
"name": "Jill",
"works_in": [
{
"place": "Chicago"
}
]
}
现在我运行搜索:
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in chicago",
"fields": [ "type", "place" ]
}
}
}
我得到了很好的回应:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "vendors",
"_id": "1",
"_score": 0.2876821,
"_source": {
"type": "doctor",
"name": "Phil",
"works_in": [
{
"place": "Chicago"
},
{
"place": "New York"
}
]
}
},
{
"_index": "test",
"_type": "vendors",
"_id": "3",
"_score": 0.2876821,
"_source": {
"type": "doctor",
"name": "Jill",
"works_in": [
{
"place": "Chicago"
}
]
}
}
]
}
}
现在事情开始出现问题了...
将 doctor
更改为 doctors
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctors in chicago",
"fields": [ "type", "place" ]
}
}
}
零个结果,因为 doctors
未找到。 Elastic 不知道复数还是单数。
将查询更改为 New York
GET /test/_search
{
"query": {
"multi_match" : {
"query": "doctor in new york",
"fields": [ "type", "place" ]
}
}
}
但是响应结果集除了New York
中的doctor
之外,还给了我Chicago
中的doctor
。字段匹配 OR...
另一个有趣的问题是,如果有人使用 docs
或 physicians
或 health professionals
但意思是 doctor
会发生什么。是否有规定我可以教 Elasticsearch 将这些内容汇集到 "doctor"?
是否有任何模式可以单独使用 elasticsearch 解决这些问题?我不必在我自己的应用程序中分析字符串的含义,然后构建一个复杂的精确 elasticsearch 查询来匹配它?
我将不胜感激任何指向正确方向的指示
我假设字段 type
和 place
属于 Text type with Standard Analyzers。
要管理 singular/plurals,您要查找的内容称为 Snowball Token Filter,您需要将其添加到映射中。
您提到的另一个要求,例如physicians
也应该等同于doctor
,你需要利用Synonym Token Filter
下面是您的映射应该如何。请注意,我刚刚将分析器添加到 type
。您可以对到其他字段的映射进行类似的更改。
映射
PUT <your_index_name>
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"tokenizer":"standard",
"filter":[
"lowercase",
"my_snow",
"my_synonym"
]
}
},
"filter":{
"my_snow":{
"type":"snowball",
"language":"English"
},
"my_synonym":{
"type":"synonym",
"synonyms":[
"docs, physicians, health professionals, doctor"
]
}
}
}
},
"mappings":{
"mydocs":{
"properties":{
"type":{
"type":"text",
"analyzer":"my_analyzer"
},
"place":{
"type":"text",
"analyzer":"my_analyzer"
}
}
}
}
}
请注意我是如何在映射本身中添加同义词的,而不是我建议您在文本文件中添加同义词,如下所示
{
"type":"synonym",
"synonyms_path" : "analysis/synonym.txt"
}
根据我分享的link,上面提到上面配置了一个同义词过滤器,路径为analysis/synonym.txt(相对于配置位置).
希望对您有所帮助!