在 Spring Data Elasticsearch 中使用 asciifolding 过滤器创建自定义分析器

Create custom analyzer with asciifolding filter in Spring Data Elasticsearch

我想在使用名称 çözüm 录制后使用 cozumçözüm 搜索时检索相同的对象。我已经搜索过了,建议使用 asciifolding filter。如何使用 spring data elasticsearch 实现此功能?

    @Document(indexName = "erp")
    public class Company {
    
        @Id
        private String id;
    
        private String name;
    
        private String description;
    
        @Field(type = FieldType.Nested, includeInParent = true)
        private List<Employee> employees;

        // getters, setter
    }

您需要创建一个 asciifolding 分析器,请参阅 the Elasticsearch docs 并将其添加到您的索引设置中。

然后您可以在 name 属性.

@Field 注释中引用此分析器

编辑:完整示例

首先是一个索引设置的文件,我命名为erp-company.json保存在[下=47=]:

{
  "analysis": {
    "analyzer": {
      "custom_analyzer": {
        "type": "custom",
        "tokenizer": "standard",
        "char_filter": [
          "html_strip"
        ],
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

然后您需要在实体 class 中引用此文件和分析器,此处命名为 Company:

@Document(indexName = "erp")
@Setting(settingPath = "/erp-company.json")
public class Company {

    @Id
    private String id;

    @Field(type = FieldType.Text, analyzer = "custom_analyzer")
    private String name;

    @Field(type = FieldType.Text, analyzer = "custom_analyzer")
    private String description;

    // getters, setter
}

用这个的CompanyController

@RestController
@RequestMapping("/company")
public class CompanyController {

    private final CompanyRepository repository;

    public CompanyController(CompanyRepository repository) {
        this.repository = repository;
    }


    @PostMapping
    public Company put(@RequestBody Company company) {
        return repository.save(company);
    }

    @GetMapping("/{name}")
    public SearchHits<Company> get(@PathVariable String name) {
        return repository.searchByName(name);
    }
}

保存一些包含变音符号的数据(使用httpie):

http POST :8080/company id=1 name="Renée et François"

没有变音符号的搜索:

http  GET :8080/company/francois

HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: keep-alive
Content-Type: application/json
Date: Wed, 09 Sep 2020 17:56:16 GMT
Expires: 0
Keep-Alive: timeout=60
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

{
    "aggregations": null,
    "empty": false,
    "maxScore": 0.2876821,
    "scrollId": null,
    "searchHits": [
        {
            "content": {
                "description": null,
                "id": "1",
                "name": "Renée et François"
            },
            "highlightFields": {},
            "id": "1",
            "index": "erp",
            "innerHits": {},
            "nestedMetaData": null,
            "score": 0.2876821,
            "sortValues": []
        }
    ],
    "totalHits": 1,
    "totalHitsRelation": "EQUAL_TO"
}

Elasticsearchreturns为索引的索引信息:

{
    "erp": {
        "aliases": {},
        "mappings": {
            "properties": {
                "_class": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "description": {
                    "analyzer": "custom_analyzer",
                    "type": "text"
                },
                "id": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "name": {
                    "analyzer": "custom_analyzer",
                    "type": "text"
                }
            }
        },
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "custom_analyzer": {
                            "char_filter": [
                                "html_strip"
                            ],
                            "filter": [
                                "lowercase",
                                "asciifolding"
                            ],
                            "tokenizer": "standard",
                            "type": "custom"
                        }
                    }
                },
                "creation_date": "1599673911503",
                "number_of_replicas": "1",
                "number_of_shards": "1",
                "provided_name": "erp",
                "uuid": "lRwcKcPUQxKKGuNJ6G30uA",
                "version": {
                    "created": "7090099"
                }
            }
        }
    }
}