非索引属性 FILTER 的查询规则

Query Rule for non-indexed attribute FILTER

我观察到这两个 AQL 语句和一个包含大约 20 条 Mio 记录的数据库之间存在巨大的运行时差异:

FOR e IN EAll
    FILTER e.lastname == "Kmp"  // <-- skip-index
    FILTER e.lastpaff != ""     // <-- no index
RETURN e
// runs in less than a second

FOR e IN EAll
    FILTER e.lastpaff != ""     // <-- no index
    FILTER e.lastname == "Kmp"  // <-- skip-index
RETURN e
// needs about a minute to execute.

除了被(或不被)索引外,这些语句的选择性也大不相同:indexedAttribute 具有高度选择性,而 nonIndexedAttribute 仅过滤 50%。

难不成还没有优化规则?我目前正在使用 ArangoDB 2.4.0.

详情:

索引属性上有一个SKIP-Index(执行计划1中似乎使用了它)。 以下是执行计划,其中仅更改了过滤器的顺序:

    FAST QUERY:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "IndexRangeNode",
            "dependencies" : [
              1
            ],
            "id" : 8,
            "estimatedCost" : 170463.32,
            "estimatedNrItems" : 170462,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "ranges" : [
              [
                {
                  "variable" : "i",
                  "attr" : "lastname",
                  "lowConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "highConst" : {
                    "bound" : "Kmp",
                    "include" : true,
                    "isConstant" : true
                  },
                  "lows" : [ ],
                  "highs" : [ ],
                  "valid" : true,
                  "equality" : true
                }
              ]
            ],
            "index" : {
              "type" : "skiplist",
              "id" : "13295598550318",
              "unique" : false,
              "fields" : [
                "lastname"
              ]
            },
            "reverse" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              8
            ],
            "id" : 5,
            "estimatedCost" : 340925.32,
            "estimatedNrItems" : 170462,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 511387.32,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 681849.3200000001,
            "estimatedNrItems" : 170462,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2",
          "use-index-range",
          "remove-filter-covered-by-index"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 681849.3200000001,
        "estimatedNrItems" : 170462
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }



    SLOW Query:

    arangosh [Uni]> stmt.explain()
    {
      "plan" : {
        "nodes" : [
          {
            "type" : "SingletonNode",
            "dependencies" : [ ],
            "id" : 1,
            "estimatedCost" : 1,
            "estimatedNrItems" : 1
          },
          {
            "type" : "EnumerateCollectionNode",
            "dependencies" : [
              1
            ],
            "id" : 2,
            "estimatedCost" : 17046233,
            "estimatedNrItems" : 17046232,
            "database" : "Uni",
            "collection" : "EAll",
            "outVariable" : {
              "id" : 0,
              "name" : "i"
            },
            "random" : false
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              2
            ],
            "id" : 3,
            "estimatedCost" : 34092465,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare !=",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastpaff",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : ""
                }
              ]
            },
            "outVariable" : {
              "id" : 1,
              "name" : "1"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              3
            ],
            "id" : 4,
            "estimatedCost" : 51138697,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 1,
              "name" : "1"
            }
          },
          {
            "type" : "CalculationNode",
            "dependencies" : [
              4
            ],
            "id" : 5,
            "estimatedCost" : 68184929,
            "estimatedNrItems" : 17046232,
            "expression" : {
              "type" : "compare ==",
              "subNodes" : [
                {
                  "type" : "attribute access",
                  "name" : "lastname",
                  "subNodes" : [
                    {
                      "type" : "reference",
                      "name" : "i",
                      "id" : 0
                    }
                  ]
                },
                {
                  "type" : "value",
                  "value" : "Kmp"
                }
              ]
            },
            "outVariable" : {
              "id" : 2,
              "name" : "2"
            },
            "canThrow" : false
          },
          {
            "type" : "FilterNode",
            "dependencies" : [
              5
            ],
            "id" : 6,
            "estimatedCost" : 85231161,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 2,
              "name" : "2"
            }
          },
          {
            "type" : "ReturnNode",
            "dependencies" : [
              6
            ],
            "id" : 7,
            "estimatedCost" : 102277393,
            "estimatedNrItems" : 17046232,
            "inVariable" : {
              "id" : 0,
              "name" : "i"
            }
          }
        ],
        "rules" : [
          "move-calculations-up",
          "move-filters-up",
          "move-calculations-up-2",
          "move-filters-up-2"
        ],
        "collections" : [
          {
            "name" : "EAll",
            "type" : "read"
          }
        ],
        "variables" : [
          {
            "id" : 0,
            "name" : "i"
          },
          {
            "id" : 1,
            "name" : "1"
          },
          {
            "id" : 2,
            "name" : "2"
          }
        ],
        "estimatedCost" : 102277393,
        "estimatedNrItems" : 17046232
      },
      "warnings" : [ ],
      "stats" : {
        "rulesExecuted" : 19,
        "rulesSkipped" : 0,
        "plansCreated" : 1
      }
    }

事实上,即使可以使用索引,如下条件也禁用了索引的使用:

FILTER doc.indexedAttribute != ... FILTER doc.indexedAttribute == ...

有趣的是,当两个条件被放入相同的FILTER条件并与&&组合时,使用了一个索引:

FILTER doc.indexedAttribute != ... && doc.indexedAttribute == ...

虽然这两个语句是等效的,但它们触发的代码路径略有不同。前者将对两个现有的 FILTER 范围进行 AND 组合,后者将从单个 FILTER 生成一个范围。 FILTER 范围的 AND 组合的情况过于防御并拒绝双方,即使只有一侧(在本例中为具有非相等运算符的一侧)不能用于索引扫描。

这已在 2.4 中修复,修复将包含在 2.4.2 中。目前的解决方法是将两个 FILTER 语句合并为一个语句。