检查全文字段是否有一个范围内的日期
Check if full text field has a date in a range
你好,我是弹性世界的新手,我想弄清楚如何找到一个字段,例如 "text"(代表索引的整个文本)是否有一个特定范围内的日期?
示例:
在字段 "text" 的 doc_1 中,我们有 "I was born on 27/05/1995",我想检查该文档中是否包含日期,该日期介于 20/05/1995 和 30/05/1995 之间。
如果这不可能,那么我如何在索引此文档时获取此日期“1995 年 5 月 27 日”并将其存储到新字段中?当我们谈论索引包含日期的文档时,你能给我一个关于最佳方法的提示吗?
谢谢
我认为您在这里有多种选择。要搜索日期范围内的文档,您必须从文本中解析日期并将它们索引为 elasticsearch 中的日期字段。您可以在将文档发送到 elasticsearch 之前在应用程序内部执行此操作,也可以查看摄取节点。摄取节点使您有机会在编制索引之前预处理文档。 https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
只要您在 elasticsearch 中拥有带有单独日期字段的文档,您就可以使用范围查询进行搜索:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
我在下面使用正则表达式来匹配文本中的特定日期。我要找的日期格式是"yyyy-mm-dd",你可以在子句里面打乱span_multi,来寻找你需要的格式。你可以阅读 span here
映射
PUT testindex
{
"mappings": {
"properties": {
"content":{
"type": "text"
}
}
}
}
数据:
[
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "a3PLFW0BY3127H1HVxyC",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/01"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "bXPLFW0BY3127H1HaBwp",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/15"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "w3PLFW0BY3127H1HeBzg",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/20"
}
}
]
查询:
GET testindex/_search
{
"query": {
"span_near": {
"clauses": [
{
----> clauses below look for year,month,date, you can change their order for desired
---->format
"span_multi": {
"match": {
"regexp": {
"content": "(19|20)[0-9]{2}"
}
}
}
},
{
"span_multi": {
"match": {
"regexp": {
"content": "0[1-9]|1[012]"
}
}
}
},
{
"span_multi": {
"match": {
"regexp": {
"content": "1[5-9]|[2][0]" --> regex for date from 15-20
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
}
结果
[
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "bXPLFW0BY3127H1HaBwp",
"_score" : 3.2095504,
"_source" : {
"content" : "I was born on 2019/09/15"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "w3PLFW0BY3127H1HeBzg",
"_score" : 3.2095504,
"_source" : {
"content" : "I was born on 2019/09/20"
}
}
]
你好,我是弹性世界的新手,我想弄清楚如何找到一个字段,例如 "text"(代表索引的整个文本)是否有一个特定范围内的日期?
示例: 在字段 "text" 的 doc_1 中,我们有 "I was born on 27/05/1995",我想检查该文档中是否包含日期,该日期介于 20/05/1995 和 30/05/1995 之间。
如果这不可能,那么我如何在索引此文档时获取此日期“1995 年 5 月 27 日”并将其存储到新字段中?当我们谈论索引包含日期的文档时,你能给我一个关于最佳方法的提示吗?
谢谢
我认为您在这里有多种选择。要搜索日期范围内的文档,您必须从文本中解析日期并将它们索引为 elasticsearch 中的日期字段。您可以在将文档发送到 elasticsearch 之前在应用程序内部执行此操作,也可以查看摄取节点。摄取节点使您有机会在编制索引之前预处理文档。 https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
只要您在 elasticsearch 中拥有带有单独日期字段的文档,您就可以使用范围查询进行搜索:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
我在下面使用正则表达式来匹配文本中的特定日期。我要找的日期格式是"yyyy-mm-dd",你可以在子句里面打乱span_multi,来寻找你需要的格式。你可以阅读 span here
映射
PUT testindex
{
"mappings": {
"properties": {
"content":{
"type": "text"
}
}
}
}
数据:
[
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "a3PLFW0BY3127H1HVxyC",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/01"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "bXPLFW0BY3127H1HaBwp",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/15"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "w3PLFW0BY3127H1HeBzg",
"_score" : 1.0,
"_source" : {
"content" : "I was born on 2019/09/20"
}
}
]
查询:
GET testindex/_search
{
"query": {
"span_near": {
"clauses": [
{
----> clauses below look for year,month,date, you can change their order for desired
---->format
"span_multi": {
"match": {
"regexp": {
"content": "(19|20)[0-9]{2}"
}
}
}
},
{
"span_multi": {
"match": {
"regexp": {
"content": "0[1-9]|1[012]"
}
}
}
},
{
"span_multi": {
"match": {
"regexp": {
"content": "1[5-9]|[2][0]" --> regex for date from 15-20
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
}
结果
[
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "bXPLFW0BY3127H1HaBwp",
"_score" : 3.2095504,
"_source" : {
"content" : "I was born on 2019/09/15"
}
},
{
"_index" : "testindex",
"_type" : "_doc",
"_id" : "w3PLFW0BY3127H1HeBzg",
"_score" : 3.2095504,
"_source" : {
"content" : "I was born on 2019/09/20"
}
}
]