如何创建文档子集并在 Elasticsearch 中针对该子集执行查询？

Question

原因是我们有一个 API 从客户端接收查询参数并构建 Elasticsearch 查询。但是，根据用户的类型（无论是财务顾问还是投资者等），我们必须应用更多条件来限制搜索。不幸的是，我们不能对索引的结构进行任何更改（即添加额外的列），那是因为索引不是由我们管理的，我们的 API 除了可配置的列名之外没有关于索引的信息.

所以这是例子。收到基于“investorDateOfBirth”和“financialAdviserId”的搜索请求，因为搜索来自顾问，我们以编程方式添加此条件：

financialAdviserId must be '123' (the id of the current user)

所以最终查询变成：

{
  "bool" : {
    "must" : [
      {
        "term" : {
          "financialAdviserId" : {
            "value" : "123",
            "boost" : 1.0
          }
        }
      }
    ],
    "should" : [
      {
        "term" : {
          "investorDateOfBirth" : {
            "value" : "1987-11-12",
            "boost" : 1.0
          }
        }
      },
      {
        "term" : {
          "financialAdviserId" : {
            "value" : "123",
            "boost" : 1.0
          }
        }
      }
    ],
    "disable_coord" : false,
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

如您所见，有 2 个 'financialAdviserId'，一个是根据请求查询参数以编程方式构建的，另一个 ('must') 是根据当前用户添加的，但如您所知，这将return 具有指定 investorDateOfBirth 的那些以及顾问 ID 为 123 的所有其他项目（包括那些没有相同 DOB 的项目）

假设索引中有 3 条记录：

| investorDateOfBirth | financialAdviserId | investorId |
| "1987-11-12"        | 123                | 111        |
| "1900-11-12"        | 123                | 222        |
| "1900-11-12"        | 123                | 333        |

对于上面的查询，结果是所有 3 行，这不是我们想要的结果，但是对于下面的查询，它 returns 只有第一行 这是期望：

{
  "bool" : {
    "must" : [
      {
        "term" : {
          "financialAdviserId" : {
            "value" : "123",
            "boost" : 1.0
          }
        }
      }
    ],
    "should" : [
      {
        "term" : {
          "investorDateOfBirth" : {
            "value" : "1987-11-12",
            "boost" : 1.0
          }
        }
      }
    ],
    "disable_coord" : false,
    "adjust_pure_negative" : true,
    "boost" : 1.0
  }
}

如何解决这个问题？我们如何更改第一个查询以获得与第二个查询相同的结果（returning 第一行）。

只是想让你知道，我们不能让 financialAdviserId 不可搜索，因为还有其他实体可以通过这些列进行搜索？有没有办法创建一个子集（在我们的例子中是 financialAdviserId 为 123 的子集），然后针对该子集执行客户端请求的查询？

我们在 Java 8

中使用 Elasticsearch v5.5.3

Answer 1

你快到了。要获得预期的行为，您可以将一个 bool 嵌套到另一个中：

{
"bool": {
  "must": [
    {
      "term": {
        "financialAdviserId": {
          "value": "123"
        }
      }
    },
    {
      "bool": {
        "should": [
          {
            "term": {
              "investorDateOfBirth": {
                "value": "1987-11-12"
              }
            }
          },
          {
            "term": {
              "financialAdviserId": {
                "value": "123"
              }
            }
          }
        ]
      }
    }
  ]
}

（我删除了 boost 和其他细节以使想法更清晰。）

为什么问题中的第一个查询不起作用

现在让我解释一下为什么初始查询不起作用。

您在 bool 查询的同一实例中使用了 must 和 should。在这种情况下记录的行为如下：

should

If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match.

（这也是为什么使用 Federico 中的 filter 的建议无法解决问题。）

所以实际上您应用的查询具有以下逻辑含义：

    query_restricting_set_of_docs AND (user_query or True)

而你正在寻找这个：

    query_restricting_set_of_docs AND user_query

在您的情况下 user_query 看起来像这样：

    query_restricting_set_of_docs OR some_other_query

这给我们带来了最终的表达：

    query_restricting_set_of_docs AND (
        query_restricting_set_of_docs OR some_other_query
    )

转换为 ES bool 查询如下：

{
  "bool": {
    "must": [
      {
        ...query_restricting_set_of_docs
      },
      {
        "bool": {
          "should": [
            {
              ...query_restricting_set_of_docs
            },
            {
              ...other_query
            }
          ]
        }
      }
    ]
  }
}

关于 query and filter context

的注意事项

过滤器和查询上下文的主要区别是：

查询上下文计算相关性分数并且结果不被缓存
过滤器上下文不计算分数但结果被缓存

缓存部分将使搜索速度更快，但如果没有相关性分数，您将无法首先显示更多相关文档。在您的情况下，您可能希望将 query_restricting_set_of_docs 放入过滤器上下文中。

为此，您可以使用以下查询：

{
  "bool": {
    "must": [
      {
        "bool": {
          "filter": [
            {
              "term": {
                "financialAdviserId": {
                  "value": "123"
                }
              }
            }
          ]
        }
      },
      {
        "bool": {
          "should": [
            {
              "term": {
                "investorDateOfBirth": {
                  "value": "1987-11-12"
                }
              }
            },
            {
              "term": {
                "financialAdviserId": {
                  "value": "123"
                }
              }
            }
          ]
        }
      }
    ]
  }
}

这里我们用filter将query_restricting_set_of_docs包装到另一个bool中，从而实现过滤部分的过滤上下文。

如果您可以控制您的索引并且您想要限制的索引有几个不同的子集，您可以使用 Filtered Aliases，这基本上会将指定的 filter 添加到所有针对该别名执行的查询。

希望对您有所帮助！

如何创建文档子集并在 Elasticsearch 中针对该子集执行查询？

How to create a subset of documents and execute a query against the subset in Elasticsearch?

java

elasticsearch

elasticsearch-5

为什么问题中的第一个查询不起作用

关于 query and filter context