如何在elasticsearch上搜索省略空格

Question

Elasticsearch 菜鸟在这里试图理解一些东西

我有这个问题

{
  "size": 10,
  "_source": "pokemon.name",
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
        {
          "multi_match": {
            "_name": "name-match",
            "type": "phrase",
            "fields": ["pokemon.name"],
            "operator": "or",
            "query": "pika"
          }
        },
        {
          "multi_match": {
            "_name": "weight-match",
            "type": "most_fields",
            // I use multi_match because I'm not sure how can I change it to match
            "fields": ["pokemon.weight"],
            "query": "10kg"
          }
        }
      ]
    }
  }
}

问题是 pokemon.weight 在值和单位 10 Kg 之间有一个 space。所以我需要忽略白色 space 以匹配 10kg

我试过更改分词器，遗憾的是它可以决定在何处拆分但不能删除字符。无论如何，我不知道如何使用它，文档也不是很有帮助，解释了理论但没有解释如何使用它。

谢谢！任何学习资源都将不胜感激。

Answer 1

您需要使用 char filter 定义自定义分析器。您将在其中将 space 字符替换为 empty 字符，以便在您的案例 10 和 g 中生成的标记变为 10g。我在本地试过，效果很好。

用于理解 analysis works in ES and example of the custom analyzer with char filters.

的奖励链接

下面是我的自定义分析器来实现所需的标记：-

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "\u0020=>"
          ]
        }
      }
    }
  }
}

现在使用相同的分析器，它生成了以下令牌，我使用 analyze api.

确认了这一点

端点：- http://{{your_hostname}}:9500/{{your_index_name}}/_analyzer

正文：-

{
    "analyzer" : "my_analyzer",
    "text" : "10 g"
}

结果：-

{
    "tokens": [
        {
            "token": "10g",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

如何在elasticsearch上搜索省略空格

How to search omitting whitespace on elasticsearch

tokenize

elasticsearch