Elasticsearch 查询为子字符串返回零结果
Elasticsearch Query Returning Zero Results for Substring
我创建了我的第一个 AWS ElasticSearch 集群并向其上传了一些数据(如下所示)。
当我搜索诸如 example.com
之类的域时,我得到的结果为零。
这是搜索查询或索引问题吗?
# curl -XGET -u username:password 'https://xxxxx.us-east-1.es.amazonaws.com/hosts/_search?q=example.com&pretty=true'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
我确认 match_all
查询确实 return 所有记录。
match_all
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "KK0PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "mail.whosebug.com",
"type" : "a",
"value" : "10.0.0.3"
}
},
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "J60PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "ns1.guardian.co.uk",
"type" : "a",
"value" : "10.0.0.2"
}
},
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "Ka0PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "test.example.com",
"type" : "a",
"value" : "10.0.0.4"
}
}
]
}
}
批量上传命令
curl -XPUT -u username:password https://xxxxx.us-east-1.es.amazonaws.com/_bulk --data-binary @bulk.json -H 'Content-Type: application/json'
bulk.json
{ "index" : { "_index": "hosts" } }
{"name":"ns1.guardian.co.uk","type":"a","value":"10.0.0.2"}
{ "index" : { "_index": "hosts" } }
{"name":"mail.whosebug.com","type":"a","value":"10.0.0.3"}
{ "index" : { "_index": "hosts" } }
{"name":"test.example.com","type":"a","value":"10.0.0.4"}
您可以使用 Path hierarchy tokenizer,它采用文件系统路径等分层值,在路径分隔符上拆分,并为树中的每个组件发出一个术语。
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"path-analyzer": {
"type": "custom",
"tokenizer": "path-tokenizer"
}
},
"tokenizer": {
"path-tokenizer": {
"type": "path_hierarchy",
"delimiter": ".",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "path-analyzer",
"search_analyzer": "keyword"
}
}
}
}
分析API
在上面的索引映射中,reverse
设置为 true
,这将以相反的顺序发出令牌。 (reverse
默认设置为 false
)
POST /hosts/_analyze
{
"analyzer": "path-analyzer",
"text": "test.example.com"
}
这将产生三个令牌:
{
"tokens": [
{
"token": "test.example.com",
"start_offset": 0,
"end_offset": 16,
"type": "word",
"position": 0
},
{
"token": "example.com",
"start_offset": 5,
"end_offset": 16,
"type": "word",
"position": 0
},
{
"token": "com",
"start_offset": 13,
"end_offset": 16,
"type": "word",
"position": 0
}
]
}
搜索查询:
{
"query": {
"term": {
"name": "example.com"
}
}
}
搜索结果:
"hits": [
{
"_index": "hosts",
"_type": "_doc",
"_id": "d67gdHMBcF4W0YVjq8ed",
"_score": 1.3744103,
"_source": {
"name": "test.example.com",
"type": "a",
"value": "10.0.0.4"
}
}
]
我创建了我的第一个 AWS ElasticSearch 集群并向其上传了一些数据(如下所示)。
当我搜索诸如 example.com
之类的域时,我得到的结果为零。
这是搜索查询或索引问题吗?
# curl -XGET -u username:password 'https://xxxxx.us-east-1.es.amazonaws.com/hosts/_search?q=example.com&pretty=true'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
我确认 match_all
查询确实 return 所有记录。
match_all
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "KK0PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "mail.whosebug.com",
"type" : "a",
"value" : "10.0.0.3"
}
},
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "J60PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "ns1.guardian.co.uk",
"type" : "a",
"value" : "10.0.0.2"
}
},
{
"_index" : "hosts",
"_type" : "_doc",
"_id" : "Ka0PcnMBqk4TBzxZPeGU",
"_score" : 1.0,
"_source" : {
"name" : "test.example.com",
"type" : "a",
"value" : "10.0.0.4"
}
}
]
}
}
批量上传命令
curl -XPUT -u username:password https://xxxxx.us-east-1.es.amazonaws.com/_bulk --data-binary @bulk.json -H 'Content-Type: application/json'
bulk.json
{ "index" : { "_index": "hosts" } }
{"name":"ns1.guardian.co.uk","type":"a","value":"10.0.0.2"}
{ "index" : { "_index": "hosts" } }
{"name":"mail.whosebug.com","type":"a","value":"10.0.0.3"}
{ "index" : { "_index": "hosts" } }
{"name":"test.example.com","type":"a","value":"10.0.0.4"}
您可以使用 Path hierarchy tokenizer,它采用文件系统路径等分层值,在路径分隔符上拆分,并为树中的每个组件发出一个术语。
索引映射:
{
"settings": {
"analysis": {
"analyzer": {
"path-analyzer": {
"type": "custom",
"tokenizer": "path-tokenizer"
}
},
"tokenizer": {
"path-tokenizer": {
"type": "path_hierarchy",
"delimiter": ".",
"reverse": "true"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "path-analyzer",
"search_analyzer": "keyword"
}
}
}
}
分析API
在上面的索引映射中,reverse
设置为 true
,这将以相反的顺序发出令牌。 (reverse
默认设置为 false
)
POST /hosts/_analyze
{
"analyzer": "path-analyzer",
"text": "test.example.com"
}
这将产生三个令牌:
{
"tokens": [
{
"token": "test.example.com",
"start_offset": 0,
"end_offset": 16,
"type": "word",
"position": 0
},
{
"token": "example.com",
"start_offset": 5,
"end_offset": 16,
"type": "word",
"position": 0
},
{
"token": "com",
"start_offset": 13,
"end_offset": 16,
"type": "word",
"position": 0
}
]
}
搜索查询:
{
"query": {
"term": {
"name": "example.com"
}
}
}
搜索结果:
"hits": [
{
"_index": "hosts",
"_type": "_doc",
"_id": "d67gdHMBcF4W0YVjq8ed",
"_score": 1.3744103,
"_source": {
"name": "test.example.com",
"type": "a",
"value": "10.0.0.4"
}
}
]