Elasticsearch - 查询没有指定时区的日期

Elasticsearch - query dates without a specified timezone

我有一个具有以下映射的索引 - 日期的标准格式。在下面的第二条记录中,指定的时间实际上是当地时间 - 但 ES 将其视为 UTC。

尽管 ES 在内部将所有解析的日期时间转换为 UTC,但显然它也必须存储原始字符串。

我的问题是是否(以及如何)查询 scheduledDT 值未明确指定时区的所有记录。

{
   "curator_v3": {
      "mappings": {
         "published": {
            "analyzer": "classic",
            "numeric_detection": true,
            "properties": {
               "Id": {
                  "type": "string",
                  "index": "not_analyzed",
                  "include_in_all": false
               },
               "createDT": {
                  "type": "date",
                  "format": "dateOptionalTime",
                  "include_in_all": false
               },
               "scheduleDT": {
                  "type": "date",
                  "format": "dateOptionalTime",
                  "include_in_all": false
               },
               "title": {
                  "type": "string",
                  "fields": {
                     "english": {
                        "type": "string",
                        "analyzer": "english"
                     },
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     },
                     "shingle": {
                        "type": "string",
                        "analyzer": "shingle"
                     },
                     "spanish": {
                        "type": "string",
                        "analyzer": "spanish"
                     }
                  },
                  "include_in_all": false
               }
            }
         }
      }
   }
}

我们使用 .NET 作为 ElasticSearch 的客户端,并且在为 scheduleDT 字段指定时区方面并不一致。

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 12,
      "successful": 12,
      "failed": 0
   },
   "hits": {
      "total": 32,
      "max_score": null,
      "hits": [
         {
            "_index": "curator_v3",
            "_type": "published",
            "_id": "29651227",
            "_score": null,
            "fields": {
               "Id": [
                  "29651227"
               ],
               "scheduleDT": [
                  "2015-11-21T22:17:51.0946798-06:00"
               ],
               "title": [
                  "97 Year-Old Woman Cries Tears Of Joy After Finally Getting Her High School Diploma"
               ],
               "createDT": [
                  "2015-11-21T22:13:32.3597142-06:00"
               ]
            },
            "sort": [
               1448165871094
            ]
         },
         {
            "_index": "curator_v3",
            "_type": "published",
            "_id": "210466413",
            "_score": null,
            "fields": {
               "Id": [
                  "210466413"
               ],
               "scheduleDT": [
                  "2015-11-22T12:00:00"
               ],
               "title": [
                  "6 KC treats to bring to Thanksgiving"
               ],
               "createDT": [
                  "2015-11-20T15:08:25.4282-06:00"
               ]
            },
            "sort": [
               1448193600000
            ]
         }
      ]
   },
   "aggregations": {
      "ScheduleDT": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 27,
         "buckets": [
            {
               "key": 1448165871094,
               "key_as_string": "2015-11-22T04:17:51.094Z",
               "doc_count": 1
            },
            {
               "key": 1448193600000,
               "key_as_string": "2015-11-22T12:00:00.000Z",
               "doc_count": 4
            }
         ]
      }
   }
}

您可以通过查询字段长度小于 20 个字符的 scheduleDT 的文档(例如 2015-11-22T12:00:00)来执行此操作。所有具有指定时区的日期字段都会更长。

应该这样做:

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc.scheduleDT.value.size() < 20"
        }
      }
    }
  }
}

但是请注意,为了使您的查询更容易创建,您应该始终尝试将所有时间戳转换为 UTC ,然后再 为您的文档编制索引。

最后,还要确保您有 dynamic scripting enabled 以便 运行 上述查询。

更新

实际上,如果您直接在脚本中使用 _source 它会起作用,因为它将 return 来自源的真实值,就像文档被索引时一样:

{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "_source.scheduleDT.size() < 20"
        }
      }
    }
  }
}