使用空格在 ElasticSearch 7.6 中搜索关键字
Search keywords in ElasticSearch 7.6 with whitespace
我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索,但我对那些包含 whitespace 的词有疑问,如下例所示:
Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".
这是我的映射模式:
mapping = {
"mappings": {
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text",
"fielddata": True
},
}
}
}
这是我的查询:
{
"query" : {
"match": { "description": escaped_keywords }
},
"highlight" : {
"pre_tags" : ["<match>"],
"post_tags" : ["</match>"],
"fields" : {
"description" : {"number_of_fragments" : 0 }
}
}
}
escaped_keywords
包含前面转义的关键字,如下:"toronto new\ mexico paris lisbona new\ york sedro\-woolley"
因此,该查询适用于单个名称城市和带有破折号的城市,但不适用于名称带有 space(纽约,新墨西哥)的名称,它们被分成(纽约,纽约,新,墨西哥)。
我也试过用这种方式给 space 的城市加上括号 toronto (new mexico) paris lisbona (new york) sedro\-woolley
但结果没有改变。
EDIT 突出显示对包含破折号的名称无效。它 returns 拆分单词(例如 [sedro, wooley] 而不是 [sedro-wooley])
EDIT 2 我的目的是匹配动态关键字列表(例如 "new york"、"toronto"、"sedro-wooley")使用 高亮标签。
这是数据样本:
{
"_index": "test_Whosebug",
"_type": "_doc",
"_id": "x4nKv3EBQE6DGGITWX-O",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Best places: New Mexico and Sedro-Woolley",
"description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
}
}
您需要定义一个 custom analyzer using char filter 来删除空格和连字符 (-
) so 以使您生成的标记符合您的要求。
索引定义
{
"settings": {
"analysis": {
"char_filter": {
"my_space_char_filter": {
"type": "mapping",
"mappings": [
"\u0020=>", -> whitespace
"\u002D=>" --> for hyphen(-)
]
}
},
"analyzer": {
"splcharanalyzer": {
"char_filter": [
"my_space_char_filter"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
"mappings" :{
"properties" :{
"title" :{
"type" : "text",
"analyzer" : "splcharanalyzer"
}
}
}
}
自定义生成的令牌 splcharanalyzer
POST myindex/_analyze
{
"analyzer": "splcharanalyzer",
"text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
{
"tokens": [
{
"token": "toronto",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "newmexico",
"start_offset": 9,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "paris",
"start_offset": 21,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "lisbona",
"start_offset": 28,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "newyork",
"start_offset": 37,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "sedrowoolley",
"start_offset": 47,
"end_offset": 60,
"type": "<ALPHANUM>",
"position": 5
}
]
}
差异搜索查询
{
"query": {
"match" : {
"title" : {
"query" : "sedro-woolley"
}
}
}
}
搜索结果
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
}
]
搜索 new
或 york
不会产生任何结果。
{
"query": {
"match" : {
"title" : {
"query" : "york"
}
}
}
}
结果
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
我正在尝试在 ElasticSearch 7.6 中实现基于城市名称的搜索,但我对那些包含 whitespace 的词有疑问,如下例所示:
Query: "toronto, new mexico, paris, lisbona, new york, sedro-woolley".
这是我的映射模式:
mapping = {
"mappings": {
"properties": {
"date": {
"type": "date"
},
"description": {
"type": "text",
"fielddata": True
},
}
}
}
这是我的查询:
{
"query" : {
"match": { "description": escaped_keywords }
},
"highlight" : {
"pre_tags" : ["<match>"],
"post_tags" : ["</match>"],
"fields" : {
"description" : {"number_of_fragments" : 0 }
}
}
}
escaped_keywords
包含前面转义的关键字,如下:"toronto new\ mexico paris lisbona new\ york sedro\-woolley"
因此,该查询适用于单个名称城市和带有破折号的城市,但不适用于名称带有 space(纽约,新墨西哥)的名称,它们被分成(纽约,纽约,新,墨西哥)。
我也试过用这种方式给 space 的城市加上括号 toronto (new mexico) paris lisbona (new york) sedro\-woolley
但结果没有改变。
EDIT 突出显示对包含破折号的名称无效。它 returns 拆分单词(例如 [sedro, wooley] 而不是 [sedro-wooley])
EDIT 2 我的目的是匹配动态关键字列表(例如 "new york"、"toronto"、"sedro-wooley")使用 高亮标签。 这是数据样本:
{
"_index": "test_Whosebug",
"_type": "_doc",
"_id": "x4nKv3EBQE6DGGITWX-O",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Best places: New Mexico and Sedro-Woolley",
"description": "This is an example text containing some cities like New York and Toronto. So, there are also Milton-Freewater and Las Vegas!"
}
}
您需要定义一个 custom analyzer using char filter 来删除空格和连字符 (-
) so 以使您生成的标记符合您的要求。
索引定义
{
"settings": {
"analysis": {
"char_filter": {
"my_space_char_filter": {
"type": "mapping",
"mappings": [
"\u0020=>", -> whitespace
"\u002D=>" --> for hyphen(-)
]
}
},
"analyzer": {
"splcharanalyzer": {
"char_filter": [
"my_space_char_filter"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
},
"mappings" :{
"properties" :{
"title" :{
"type" : "text",
"analyzer" : "splcharanalyzer"
}
}
}
}
自定义生成的令牌 splcharanalyzer
POST myindex/_analyze
{
"analyzer": "splcharanalyzer",
"text": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
{
"tokens": [
{
"token": "toronto",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "newmexico",
"start_offset": 9,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "paris",
"start_offset": 21,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "lisbona",
"start_offset": 28,
"end_offset": 35,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "newyork",
"start_offset": 37,
"end_offset": 45,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "sedrowoolley",
"start_offset": 47,
"end_offset": 60,
"type": "<ALPHANUM>",
"position": 5
}
]
}
差异搜索查询
{
"query": {
"match" : {
"title" : {
"query" : "sedro-woolley"
}
}
}
}
搜索结果
"hits": [
{
"_index": "white",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"title": "toronto, new mexico, paris, lisbona, new york, sedro-woolley"
}
}
]
搜索 new
或 york
不会产生任何结果。
{
"query": {
"match" : {
"title" : {
"query" : "york"
}
}
}
}
结果
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}