如何在elasticsearch中查询嵌套结构

How to query nested structure in elasticsearch

下面是我的 elasticsearch 索引中的两条模拟记录。我的 ES 中有数百万条记录。我正在尝试查询 ES 以获取所有具有非空/非空 "tags" 字段的记录。如果记录没有标签(如下面的第二条记录),那么我不想从 ES 中提取它。

如果 "books" 没有嵌套,那么谷歌搜索似乎下面的查询会起作用 -

curl -XGET 'host:port/book_indx/book/_search?' -d '{
    "query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source"}}}}
}'

但是我没有找到查询嵌套结构的解决方案。我尝试了以下但没有成功 -

{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source.tags"}}}}}

{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source":{"tags"}}}}}}

非常感谢任何建议!提前致谢。

{
"_shards": {
    "failed": 0,
    "successful": 12,
    "total": 12
},
"hits": {
    "hits": [
        {
            "_id": "book1",
            "_index": "book",
            "_source": {
                "book_name": "How to Get Organized",
                "publication_date": "2014-02-24T16:50:39+0000",
                "tags": [
                    {
                        "category": "self help",
                        "topics": [
                            {
                                "name": "time management",
                                "page": 6198
                            },
                            {
                                "name": "calendar",
                                "page": 10
                            }
                        ],
                        "id": "WEONWOIR234LI",
                    }
                ],
                "last_updated": "2015-11-11T16:28:32.308+0000"
            },
            "_type": "book"
        },
        {
            "_id": "book2",
            "_index": "book",
            "_source": {
                "book_name": "How to Cook",
                "publication_date": "2014-02-24T16:50:39+0000",
                "tags": [],
                "last_updated": "2015-11-11T16:28:32.308+0000"
            },
            "_type": "book"
        }
    ],
    "total": 1
},
"timed_out": false,
"took": 80

}

映射-

        "book": {
            "_id": {
                "path": "message_id"
            },
            "properties": {
                "book_name": {
                    "index": "not_analyzed",
                    "type": "string"
                },
                "publication_date": {
                    "format": "date_time||date_time_no_millis",
                    "type": "date"
                },
                "tags": {
                    "properties": {
                        "category": {
                            "index": "not_analyzed",
                            "type": "string"
                        },
                        "topic": {
                            "properties": {
                                "name": {
                                    "index": "not_analyzed",
                                    "type": "string"
                                },
                                "page": {
                                    "index": "no",
                                    "type": "integer"
                                }                     
                            }
                        },
                        "id": {
                            "index": "not_analyzed",
                            "type": "string"
                        }
                    },
                    "type": "nested"
                },
                "last_updated": {
                    "format": "date_time||date_time_no_millis",
                    "type": "date"
                }
            }
        }   

由于您的 tags 字段具有 nested 类型,您需要使用 nested filter 才能查询它。

以下过滤查询将正确地 return 只有上面的第一个文档(即 ID book1

{
  "query": {
    "filtered": {
      "filter": {
        "nested": {
          "path": "tags",
          "filter": {
            "exists": {
              "field": "tags"
            }
          }
        }
      }
    }
  }
}