Elasticsearch 中的单个或多个文档中的多个单词匹配(全文)

Multiple Word Matches (Full Text) In single or multiple documents in Elasticsearch

我的要求是这样的:

如果我将多个单词作为列表进行搜索,ES 将 return 具有单词匹配子集的文档以及匹配的单词所以我可以了解哪个文档匹配哪个子集。

假设我需要搜索 Football、Cricket、Tennis、Golf 等词。 在三个文档中

我准备将这些文件存储在相应的文档中。 "mydocuments" 索引的映射如下所示:

{
  "mydocuments" : {
    "mappings" : {
      "docs" : {
        "properties" : {
          "file_content" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

第一个文档

{ _id: 1, file_content: "I love tennis and cricket"}

第二个文档:

{ _id: 2, file_content: "tennis and football are very popular"}

第三个文档:

{ _id: 3, file_content: "football and cricket are originated in england"}

I should be able to search a single file/or multiple files for Football, Tennis, cricket, golf and it should return something like this

像这样

    "hits":{
        "total" : 3,
        "hits" : [
            {
                "_index" : "twitter",
                "_type" : "tweet",
                "_id" : "1",
                "_source" : {
                    "file_content" : ["football","cricket"],
                    "postDate" : "2009-11-15T14:12:12",

                }
                },
                {
                    "_index" : "twitter",
                    "_type" : "tweet",
                    "_id" : "2",
                    "_source" : {
                        "file_content" : ["football","tennis"],
                        "postDate" : "2009-11-15T14:12:12",

                    }
                }
            ]

或者如果是多个文件搜索上述搜索结果的数组

知道我们如何使用 Elasticsearch 做到这一点吗?

如果这真的无法使用 elasticsearch 完成,我准备评估任何其他选项(Native lucene、Solr)

编辑

我的错可能是我没有提供足够的细节。 @Andrew 我所说的文件是指在 ES 文档中存储为字符串字段(全文)的文件的文本内容。假设一个文件对应一个文档,在名为 "file_content".

的字段中包含文本内容字符串

最接近您想要的是highlighting,意思是强调文档中的搜索词。

示例查询:

{
  "query": {
    "match": {
      "file_content": "football tennis cricket golf"
    }
  },
  "highlight": {
    "fields": {"file_content":{}}
  }
}

结果:


       "hits": {
          "total": 3,
          "max_score": 0.027847305,
          "hits": [
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "1",
                "_score": 0.027847305,
                "_source": {
                   "file_content": "I love tennis and cricket"
                },
                "highlight": {
                   "file_content": [
                      "I love <em>tennis</em> and <em>cricket</em>"
                   ]
                }
             },
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "2",
                "_score": 0.023869118,
                "_source": {
                   "file_content": "tennis and football are very popular"
                },
                "highlight": {
                   "file_content": [
                      "<em>tennis</em> and <em>football</em> are very popular"
                   ]
                }
             },
             {
                "_index": "test_highlight",
                "_type": "docs",
                "_id": "3",
                "_score": 0.023869118,
                "_source": {
                   "file_content": "football and cricket are originated in england"
                },
                "highlight": {
                   "file_content": [
                      "<em>football</em> and <em>cricket</em> are originated in england"
                   ]
                }
             }
          ]
       }

如您所见,找到的字词在特殊的 highlight 部分下突出显示(被 <em> 标记包围的元素)。