在 Spring Data Elasticsearch 中使用 asciifolding 过滤器创建自定义分析器
Create custom analyzer with asciifolding filter in Spring Data Elasticsearch
我想在使用名称 çözüm
录制后使用 cozum
或 çözüm
搜索时检索相同的对象。我已经搜索过了,建议使用 asciifolding filter
。如何使用 spring data elasticsearch 实现此功能?
@Document(indexName = "erp")
public class Company {
@Id
private String id;
private String name;
private String description;
@Field(type = FieldType.Nested, includeInParent = true)
private List<Employee> employees;
// getters, setter
}
您需要创建一个 asciifolding 分析器,请参阅 the Elasticsearch docs 并将其添加到您的索引设置中。
然后您可以在 name 属性.
的 @Field
注释中引用此分析器
编辑:完整示例
首先是一个索引设置的文件,我命名为erp-company.json保存在[下=47=]:
{
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
然后您需要在实体 class 中引用此文件和分析器,此处命名为 Company
:
@Document(indexName = "erp")
@Setting(settingPath = "/erp-company.json")
public class Company {
@Id
private String id;
@Field(type = FieldType.Text, analyzer = "custom_analyzer")
private String name;
@Field(type = FieldType.Text, analyzer = "custom_analyzer")
private String description;
// getters, setter
}
用这个的CompanyController
:
@RestController
@RequestMapping("/company")
public class CompanyController {
private final CompanyRepository repository;
public CompanyController(CompanyRepository repository) {
this.repository = repository;
}
@PostMapping
public Company put(@RequestBody Company company) {
return repository.save(company);
}
@GetMapping("/{name}")
public SearchHits<Company> get(@PathVariable String name) {
return repository.searchByName(name);
}
}
保存一些包含变音符号的数据(使用httpie):
http POST :8080/company id=1 name="Renée et François"
没有变音符号的搜索:
http GET :8080/company/francois
HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: keep-alive
Content-Type: application/json
Date: Wed, 09 Sep 2020 17:56:16 GMT
Expires: 0
Keep-Alive: timeout=60
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
{
"aggregations": null,
"empty": false,
"maxScore": 0.2876821,
"scrollId": null,
"searchHits": [
{
"content": {
"description": null,
"id": "1",
"name": "Renée et François"
},
"highlightFields": {},
"id": "1",
"index": "erp",
"innerHits": {},
"nestedMetaData": null,
"score": 0.2876821,
"sortValues": []
}
],
"totalHits": 1,
"totalHitsRelation": "EQUAL_TO"
}
Elasticsearchreturns为索引的索引信息:
{
"erp": {
"aliases": {},
"mappings": {
"properties": {
"_class": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"description": {
"analyzer": "custom_analyzer",
"type": "text"
},
"id": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"name": {
"analyzer": "custom_analyzer",
"type": "text"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard",
"type": "custom"
}
}
},
"creation_date": "1599673911503",
"number_of_replicas": "1",
"number_of_shards": "1",
"provided_name": "erp",
"uuid": "lRwcKcPUQxKKGuNJ6G30uA",
"version": {
"created": "7090099"
}
}
}
}
}
我想在使用名称 çözüm
录制后使用 cozum
或 çözüm
搜索时检索相同的对象。我已经搜索过了,建议使用 asciifolding filter
。如何使用 spring data elasticsearch 实现此功能?
@Document(indexName = "erp")
public class Company {
@Id
private String id;
private String name;
private String description;
@Field(type = FieldType.Nested, includeInParent = true)
private List<Employee> employees;
// getters, setter
}
您需要创建一个 asciifolding 分析器,请参阅 the Elasticsearch docs 并将其添加到您的索引设置中。
然后您可以在 name 属性.
的@Field
注释中引用此分析器
编辑:完整示例
首先是一个索引设置的文件,我命名为erp-company.json保存在[下=47=]:
{
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
}
然后您需要在实体 class 中引用此文件和分析器,此处命名为 Company
:
@Document(indexName = "erp")
@Setting(settingPath = "/erp-company.json")
public class Company {
@Id
private String id;
@Field(type = FieldType.Text, analyzer = "custom_analyzer")
private String name;
@Field(type = FieldType.Text, analyzer = "custom_analyzer")
private String description;
// getters, setter
}
用这个的CompanyController
:
@RestController
@RequestMapping("/company")
public class CompanyController {
private final CompanyRepository repository;
public CompanyController(CompanyRepository repository) {
this.repository = repository;
}
@PostMapping
public Company put(@RequestBody Company company) {
return repository.save(company);
}
@GetMapping("/{name}")
public SearchHits<Company> get(@PathVariable String name) {
return repository.searchByName(name);
}
}
保存一些包含变音符号的数据(使用httpie):
http POST :8080/company id=1 name="Renée et François"
没有变音符号的搜索:
http GET :8080/company/francois
HTTP/1.1 200
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: keep-alive
Content-Type: application/json
Date: Wed, 09 Sep 2020 17:56:16 GMT
Expires: 0
Keep-Alive: timeout=60
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
{
"aggregations": null,
"empty": false,
"maxScore": 0.2876821,
"scrollId": null,
"searchHits": [
{
"content": {
"description": null,
"id": "1",
"name": "Renée et François"
},
"highlightFields": {},
"id": "1",
"index": "erp",
"innerHits": {},
"nestedMetaData": null,
"score": 0.2876821,
"sortValues": []
}
],
"totalHits": 1,
"totalHitsRelation": "EQUAL_TO"
}
Elasticsearchreturns为索引的索引信息:
{
"erp": {
"aliases": {},
"mappings": {
"properties": {
"_class": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"description": {
"analyzer": "custom_analyzer",
"type": "text"
},
"id": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"name": {
"analyzer": "custom_analyzer",
"type": "text"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"char_filter": [
"html_strip"
],
"filter": [
"lowercase",
"asciifolding"
],
"tokenizer": "standard",
"type": "custom"
}
}
},
"creation_date": "1599673911503",
"number_of_replicas": "1",
"number_of_shards": "1",
"provided_name": "erp",
"uuid": "lRwcKcPUQxKKGuNJ6G30uA",
"version": {
"created": "7090099"
}
}
}
}
}