AQL 查询真的很慢(~20 秒)
AQL Query Really Slow (~20 seconds)
执行以下查询大约需要 20 秒:
FOR p IN PATHS(locations, connections, "outbound", { maxLength: 1 }) FILTER p.source._key == "26094" RETURN p.vertices[*].name
我相信这是一个简单的查询(并且数据库不是那么大)并且它应该执行得相当快......我一定是做错了什么......这是查询结果:
==> [object ArangoQueryCursor - count: 286, hasMore: false]
locations
(顶点)集合有 23753 个文档,connections
(边)集合有 123414 个文档。
我也尝试过按 _id
过滤,但性能有些相同。
我可以做些什么来获得更好的性能吗?
这是查询的 .explain()
报告:
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "CalculationNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 2,
"estimatedNrItems" : 1,
"expression" : {
"type" : "function call",
"name" : "PATHS",
"subNodes" : [
{
"type" : "array",
"subNodes" : [
{
"type" : "collection",
"name" : "locations"
},
{
"type" : "collection",
"name" : "connections"
},
{
"type" : "value",
"value" : "outbound"
},
{
"type" : "object",
"subNodes" : [
{
"type" : "object element",
"name" : "maxLength",
"subNodes" : [
{
"type" : "value",
"value" : 1
}
]
}
]
}
]
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : true
},
{
"type" : "EnumerateListNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 102,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 2,
"name" : "2"
},
"outVariable" : {
"id" : 0,
"name" : "p"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 202,
"estimatedNrItems" : 100,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "_key",
"subNodes" : [
{
"type" : "attribute access",
"name" : "source",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "value",
"value" : "26094"
}
]
},
"outVariable" : {
"id" : 3,
"name" : "3"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 302,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 3,
"name" : "3"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 402,
"estimatedNrItems" : 100,
"expression" : {
"type" : "expand",
"subNodes" : [
{
"type" : "iterator",
"subNodes" : [
{
"type" : "variable",
"name" : "1_",
"id" : 1
},
{
"type" : "attribute access",
"name" : "vertices",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "attribute access",
"name" : "name",
"subNodes" : [
{
"type" : "reference",
"name" : "1_",
"id" : 1
}
]
}
]
},
"outVariable" : {
"id" : 4,
"name" : "4"
},
"canThrow" : false
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 502,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 4,
"name" : "4"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "connections",
"type" : "read"
},
{
"name" : "locations",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "p"
},
{
"id" : 1,
"name" : "1_"
},
{
"id" : 2,
"name" : "2"
},
{
"id" : 3,
"name" : "3"
},
{
"id" : 4,
"name" : "4"
}
],
"estimatedCost" : 502,
"estimatedNrItems" : 100
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 21,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
PATHS()
将构建图形的所有路径,然后使用 _key
属性上的 FILTER
过滤结果 post。在过滤掉所有不匹配项之前,这可能会首先创建一个巨大的结果集(对于所有路径)。
如果只需要在深度 1 上找到相连的顶点,我认为这样做会更有效率:
正在使用 TRAVERSAL
进行查询:
这样效率更高,因为它将构建图中的所有路径,但只构建从指定起始顶点开始的路径:
FOR p IN TRAVERSAL(locations, connections, "1", "outbound", { minDepth: 1, maxDepth: 1, paths: true })
RETURN p.path.vertices[*].name
使用NEIGHBORS
查询直接邻居:
这可能会稍微更有效率,因为它会构建一个较小的中间结果。
此外,它不会 return 起始顶点 (26094),而是直接连接到它的所有顶点:
FOR p IN NEIGHBORS(locations, connections, "26094", "outbound")
RETURN p.vertex.name
直接查询边(不使用图函数)
终于可以直接查询边集合了。
同样,这不会 return 起始顶点 (26094),而是直接连接到它的所有顶点:
FOR edge IN connections
FILTER edge._from == "locations/26094"
FOR vertex IN locations
FILTER vertex._id == edge._to
RETURN vertex.name
执行以下查询大约需要 20 秒:
FOR p IN PATHS(locations, connections, "outbound", { maxLength: 1 }) FILTER p.source._key == "26094" RETURN p.vertices[*].name
我相信这是一个简单的查询(并且数据库不是那么大)并且它应该执行得相当快......我一定是做错了什么......这是查询结果:
==> [object ArangoQueryCursor - count: 286, hasMore: false]
locations
(顶点)集合有 23753 个文档,connections
(边)集合有 123414 个文档。
我也尝试过按 _id
过滤,但性能有些相同。
我可以做些什么来获得更好的性能吗?
这是查询的 .explain()
报告:
{
"plan" : {
"nodes" : [
{
"type" : "SingletonNode",
"dependencies" : [ ],
"id" : 1,
"estimatedCost" : 1,
"estimatedNrItems" : 1
},
{
"type" : "CalculationNode",
"dependencies" : [
1
],
"id" : 2,
"estimatedCost" : 2,
"estimatedNrItems" : 1,
"expression" : {
"type" : "function call",
"name" : "PATHS",
"subNodes" : [
{
"type" : "array",
"subNodes" : [
{
"type" : "collection",
"name" : "locations"
},
{
"type" : "collection",
"name" : "connections"
},
{
"type" : "value",
"value" : "outbound"
},
{
"type" : "object",
"subNodes" : [
{
"type" : "object element",
"name" : "maxLength",
"subNodes" : [
{
"type" : "value",
"value" : 1
}
]
}
]
}
]
}
]
},
"outVariable" : {
"id" : 2,
"name" : "2"
},
"canThrow" : true
},
{
"type" : "EnumerateListNode",
"dependencies" : [
2
],
"id" : 3,
"estimatedCost" : 102,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 2,
"name" : "2"
},
"outVariable" : {
"id" : 0,
"name" : "p"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
3
],
"id" : 4,
"estimatedCost" : 202,
"estimatedNrItems" : 100,
"expression" : {
"type" : "compare ==",
"subNodes" : [
{
"type" : "attribute access",
"name" : "_key",
"subNodes" : [
{
"type" : "attribute access",
"name" : "source",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "value",
"value" : "26094"
}
]
},
"outVariable" : {
"id" : 3,
"name" : "3"
},
"canThrow" : false
},
{
"type" : "FilterNode",
"dependencies" : [
4
],
"id" : 5,
"estimatedCost" : 302,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 3,
"name" : "3"
}
},
{
"type" : "CalculationNode",
"dependencies" : [
5
],
"id" : 6,
"estimatedCost" : 402,
"estimatedNrItems" : 100,
"expression" : {
"type" : "expand",
"subNodes" : [
{
"type" : "iterator",
"subNodes" : [
{
"type" : "variable",
"name" : "1_",
"id" : 1
},
{
"type" : "attribute access",
"name" : "vertices",
"subNodes" : [
{
"type" : "reference",
"name" : "p",
"id" : 0
}
]
}
]
},
{
"type" : "attribute access",
"name" : "name",
"subNodes" : [
{
"type" : "reference",
"name" : "1_",
"id" : 1
}
]
}
]
},
"outVariable" : {
"id" : 4,
"name" : "4"
},
"canThrow" : false
},
{
"type" : "ReturnNode",
"dependencies" : [
6
],
"id" : 7,
"estimatedCost" : 502,
"estimatedNrItems" : 100,
"inVariable" : {
"id" : 4,
"name" : "4"
}
}
],
"rules" : [
"move-calculations-up",
"move-filters-up",
"move-calculations-up-2",
"move-filters-up-2"
],
"collections" : [
{
"name" : "connections",
"type" : "read"
},
{
"name" : "locations",
"type" : "read"
}
],
"variables" : [
{
"id" : 0,
"name" : "p"
},
{
"id" : 1,
"name" : "1_"
},
{
"id" : 2,
"name" : "2"
},
{
"id" : 3,
"name" : "3"
},
{
"id" : 4,
"name" : "4"
}
],
"estimatedCost" : 502,
"estimatedNrItems" : 100
},
"warnings" : [ ],
"stats" : {
"rulesExecuted" : 21,
"rulesSkipped" : 0,
"plansCreated" : 1
}
}
PATHS()
将构建图形的所有路径,然后使用 _key
属性上的 FILTER
过滤结果 post。在过滤掉所有不匹配项之前,这可能会首先创建一个巨大的结果集(对于所有路径)。
如果只需要在深度 1 上找到相连的顶点,我认为这样做会更有效率:
正在使用
TRAVERSAL
进行查询:这样效率更高,因为它将构建图中的所有路径,但只构建从指定起始顶点开始的路径:
FOR p IN TRAVERSAL(locations, connections, "1", "outbound", { minDepth: 1, maxDepth: 1, paths: true }) RETURN p.path.vertices[*].name
使用
NEIGHBORS
查询直接邻居:这可能会稍微更有效率,因为它会构建一个较小的中间结果。 此外,它不会 return 起始顶点 (26094),而是直接连接到它的所有顶点:
FOR p IN NEIGHBORS(locations, connections, "26094", "outbound") RETURN p.vertex.name
直接查询边(不使用图函数)
终于可以直接查询边集合了。 同样,这不会 return 起始顶点 (26094),而是直接连接到它的所有顶点:
FOR edge IN connections FILTER edge._from == "locations/26094" FOR vertex IN locations FILTER vertex._id == edge._to RETURN vertex.name