当查询文本为字母数字模式时，Elasticsearch 返回所有记录

Question

我在 Elasticsearch 中有一个索引映射为：

/price_validity_idx

{
  "mappings": {
    "properties": {
      "title":{
        "type": "text"
      },
      "validity":{
        "type": "boolean"
      }
    }
  }
}

存储在这个索引中的数据看起来像这样：

{
 "title" : "16 USD product"
"validity": true
}
{
 "title" : "USD 5 refill"
"validity": true
}
.....
{
 "title" : "10 USD"
"validity": false
},
{
"title" : "Movies on Demand-Free of cost"
"validity": false
},
{
"title" : "One month subscription on Cash purchase"
"validity": true
}

因此，每当我在字段 title 上进行 Match Query 时，查询文本为字母数字 (eg.USD 5) title 中具有数值的所有记录都作为结果的一部分得到 returned。

For example, curl -XGET '/price_validity_idx' -d '{"query":{"match": { "title": "USD 5" } }}'

输出：（为紧凑删除 elasticsearch 元信息）

{
 "title" : "16 USD product"
"validity": true
},
{
 "title" : "USD 5 refill"
"validity": true
},
{
 "title" : "10 USD"
"validity": false
}

但每当我在字段 title 上进行相同的 Match Query 时，只有数字作为查询文本（例如 5） , 然后匹配数字的特定记录是 returned.

当查询文本为字母数字（例如 5 美元）时，如何使 return 仅匹配精确数值的记录。由于某些业务限制，我无法将映射类型更改为 INTEGER。此外，我将无法使用 TERM 查询，因为该字段也包含有点冗长的文本数据。

请帮忙，因为我是 Elasticsearch 的新手。

使用的版本是 Elasticsearch-7.8.1

Answer 1

标准分析器是默认分析器，如果指定 none，则使用该分析器。生成的标记为 usd 和 5，因此与这些标记中的任何一个匹配的所有文档都将与搜索查询匹配。

分析API

GET/ _analyze
{
  "analyzer" : "standard",
  "text" : "USD 5"
}

生成了以下令牌：

{
  "tokens": [
    {
      "token": "usd",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "5",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<NUM>",
      "position": 1
    }
  ]
}

You can use the match_phrase query that analyzes the text and creates a phrase query out of the analyzed text.

搜索查询：

{
  "query": {
    "match_phrase": {
      "title": "USD 5"
    }
  }
}

搜索结果：

"hits": [
      {
        "_index": "64528215",
        "_type": "_doc",
        "_id": "2",
        "_score": 2.1446278,
        "_source": {
          "title": "USD 5 refill",
          "validity": true
        }
      }
    ]

编辑 1：

You can even use match query with operator AND, that is a boolean logic used to interpret the text in the query value

{
  "query": {
    "match": {
      "title": {
        "query": "USD 5",
        "operator": "and"
      }
    }
  }
}

当查询文本为字母数字模式时，Elasticsearch 返回所有记录

Elasticsearch is returning all the records when query text is of alphanumeric pattern

search

full-text-search

analyzer

elasticsearch

elastic-stack