python elasticsearch:如何在所有字段上查询一个字符串
python elasticsearch: how to query a string on all fields
我有以下 python 代码,它工作正常,给我带来了 50 个预期的结果:
elastic = settings.ELASTIC
indexes = u'nginx-access-2769z-2018.11.26.16'
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
]
range_for_search = {
'gte': str(1543248611),
'lte': str(1543249511),
'format': 'epoch_second',
}
query_body = {
'from': 0,
'size': 50,
'query': {
'bool': {
'must': filter_by_client,
'filter': {'range': {'@timestamp': range_for_search}},
},
}
}
search_result = elastic.search(index=indexes, body=query_body)
results = [result['_source'] for result in search_result['hits']['hits']]
现在如果我添加另一个过滤器,例如
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'match': {'remote_address': '181.220.174.189'}}
]
...
它也很好用!将其缩小到 5 个结果。
我的问题是:如何在 所有字段 上查询该字符串?如果该字符串位于字段的 start/end 处,如果它是大写字母,如果该字段实际上是 integer/float 而不是字符串,那么对我来说无关紧要,...
已经尝试过像这样使用“_all”关键字
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'match': {'_all': '181.220.174.189'}}
]
...
但它给了我 0 个结果。尝试重现通过 Kibana 界面发生的相同行为。
Nishant 提到的是使用 copy_to
字段的最佳解决方案,但是如果您无法控制更改映射,那么您可以尝试看看以下任何方法是否有帮助。
使用查询字符串查询
您可以使用 Query String Query,您的查询如下:
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'query_string': {'query': '181.220.174.189'}}
]
...
一个重要的注意事项是 query_string
默认搜索所有字段。我提到的 link 说明如下:
The default field for query terms if no prefix field is specified.
Defaults to the index.query.default_field index settings, which in
turn defaults to *. * extracts all fields in the mapping that are
eligible to term queries and filters the metadata fields.
另外我提到这一点是因为我希望您在决定使用 query_string.
之前了解使用 query_string 与简单匹配 Match vs Query-String 的区别
The match family of queries does not go through a "query parsing"
process. It does not support field name prefixes, wildcard characters,
or other "advanced" features. For this reason, chances of it failing
are very small / non existent, and it provides an excellent behavior
when it comes to just analyze and run that text as a query behavior
(which is usually what a text search box does). Also, the
phrase_prefix type can provide a great "as you type" behavior to
automatically load search results.
使用多重匹配
下面是另一种可能的解决方案,如果您不想更改映射,它会使用 multi-match 查询
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'multi_match': {'query': '181.220.174.189', 'fields': ['url', 'field_2']}}
]
...
查看您需要如何在查询时明确提及要考虑的字段。但一定要validate/test彻底了解它。
如果有帮助请告诉我!
我有以下 python 代码,它工作正常,给我带来了 50 个预期的结果:
elastic = settings.ELASTIC
indexes = u'nginx-access-2769z-2018.11.26.16'
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
]
range_for_search = {
'gte': str(1543248611),
'lte': str(1543249511),
'format': 'epoch_second',
}
query_body = {
'from': 0,
'size': 50,
'query': {
'bool': {
'must': filter_by_client,
'filter': {'range': {'@timestamp': range_for_search}},
},
}
}
search_result = elastic.search(index=indexes, body=query_body)
results = [result['_source'] for result in search_result['hits']['hits']]
现在如果我添加另一个过滤器,例如
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'match': {'remote_address': '181.220.174.189'}}
]
...
它也很好用!将其缩小到 5 个结果。
我的问题是:如何在 所有字段 上查询该字符串?如果该字符串位于字段的 start/end 处,如果它是大写字母,如果该字段实际上是 integer/float 而不是字符串,那么对我来说无关紧要,...
已经尝试过像这样使用“_all”关键字
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'match': {'_all': '181.220.174.189'}}
]
...
但它给了我 0 个结果。尝试重现通过 Kibana 界面发生的相同行为。
Nishant 提到的是使用 copy_to
字段的最佳解决方案,但是如果您无法控制更改映射,那么您可以尝试看看以下任何方法是否有帮助。
使用查询字符串查询
您可以使用 Query String Query,您的查询如下:
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'query_string': {'query': '181.220.174.189'}}
]
...
一个重要的注意事项是 query_string
默认搜索所有字段。我提到的 link 说明如下:
The default field for query terms if no prefix field is specified. Defaults to the index.query.default_field index settings, which in turn defaults to *. * extracts all fields in the mapping that are eligible to term queries and filters the metadata fields.
另外我提到这一点是因为我希望您在决定使用 query_string.
之前了解使用 query_string 与简单匹配 Match vs Query-String 的区别The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advanced" features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great "as you type" behavior to automatically load search results.
使用多重匹配
下面是另一种可能的解决方案,如果您不想更改映射,它会使用 multi-match 查询
...
filter_by_client = [
{'match_phrase': {'client_id': '2769z'}},
{'multi_match': {'query': '181.220.174.189', 'fields': ['url', 'field_2']}}
]
...
查看您需要如何在查询时明确提及要考虑的字段。但一定要validate/test彻底了解它。
如果有帮助请告诉我!