如何针对大图优化 neo4j 密码查询?
How to optimise neo4j cypher query for high size graph?
我编写此查询是为了查找两个节点之间的可能路径。但是,当我尝试使用超过 3 个步骤时,它无法完成工作。我使用的图表包含超过 400 万个节点和 4900 万个关系。
match (src:T047 {CUI:"C0030920"}),
(trg:T059 {CUI:"C1294944"}),
p = (src)-[*..3]-(trg)
where
all(relI in relationships(p)
where type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
and
all(nodeI in nodes(p)
where labels(nodeI) in ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020",
"T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032",
"T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046",
"T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061",
"T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116",
"T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191",
"T195", "T196", "T197", "T200", "T201"])
return p
以下是此查询的计划:
https://imgur.com/PpWePOz
是否有任何可能的方法来优化此查询或至少找到估计时间?
首先,您的查询计划显示您没有使用索引,因此它对 :T059 节点使用 NodeByLabelScan,并对所有节点使用 运行 过滤器以查找具有 属性 的节点题。 src
节点也没有使用索引查找,而是针对标签和 属性.
过滤可变长度扩展的结果
您将需要这些索引来帮助提高性能。 :T047(CUI)
和 :T059(CUI)
上的索引是您需要的索引。确保你先有这个。
此外,要强制执行索引查找(与 var-length-expand 和过滤器相反,后者会更昂贵),您可以向规划器提供索引提示。
我们还可以调整路径中节点上标签的列表谓词,这样它们将在扩展期间而不是之后被过滤。
WITH ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020", "T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032", "T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046", "T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061", "T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116", "T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191", "T195", "T196", "T197", "T200", "T201"] as allowedLabels
MATCH (src:T047 {CUI:"C0030920"}),
(trg:T059 {CUI:"C1294944"})
USING INDEX src:T047(CUI)
USING INDEX trg:T059(CUI)
MATCH p = (src)-[*..3]-(trg)
WHERE
all(relI in relationships(p) WHERE type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
AND all(node IN nodes(p) WHERE labels(node)[0] IN allowedLabels)
RETURN p
这也假设这里的所有节点只有一个可能的标签,而不是多个。如果他们可以有多个标签,那么我们可能需要重组查询。
我编写此查询是为了查找两个节点之间的可能路径。但是,当我尝试使用超过 3 个步骤时,它无法完成工作。我使用的图表包含超过 400 万个节点和 4900 万个关系。
match (src:T047 {CUI:"C0030920"}),
(trg:T059 {CUI:"C1294944"}),
p = (src)-[*..3]-(trg)
where
all(relI in relationships(p)
where type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
and
all(nodeI in nodes(p)
where labels(nodeI) in ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020",
"T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032",
"T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046",
"T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061",
"T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116",
"T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191",
"T195", "T196", "T197", "T200", "T201"])
return p
以下是此查询的计划: https://imgur.com/PpWePOz
是否有任何可能的方法来优化此查询或至少找到估计时间?
首先,您的查询计划显示您没有使用索引,因此它对 :T059 节点使用 NodeByLabelScan,并对所有节点使用 运行 过滤器以查找具有 属性 的节点题。 src
节点也没有使用索引查找,而是针对标签和 属性.
您将需要这些索引来帮助提高性能。 :T047(CUI)
和 :T059(CUI)
上的索引是您需要的索引。确保你先有这个。
此外,要强制执行索引查找(与 var-length-expand 和过滤器相反,后者会更昂贵),您可以向规划器提供索引提示。
我们还可以调整路径中节点上标签的列表谓词,这样它们将在扩展期间而不是之后被过滤。
WITH ["T004", "T005", "T007", "T016", "T017", "T018", "T019", "T020", "T021", "T022", "T023", "T024", "T025", "T026", "T028", "T029", "T030", "T031", "T032", "T033", "T034", "T037", "T038", "T039", "T040", "T041", "T042", "T043", "T045", "T046", "T047", "T048", "T049", "T053", "T054", "T055", "T056", "T057", "T059", "T060", "T061", "T074", "T080", "T081", "T098", "T099", "T100", "T101", "T103", "T109", "T114", "T116", "T121", "T123", "T125", "T126", "T127", "T129", "T131", "T168", "T184", "T190", "T191", "T195", "T196", "T197", "T200", "T201"] as allowedLabels
MATCH (src:T047 {CUI:"C0030920"}),
(trg:T059 {CUI:"C1294944"})
USING INDEX src:T047(CUI)
USING INDEX trg:T059(CUI)
MATCH p = (src)-[*..3]-(trg)
WHERE
all(relI in relationships(p) WHERE type(relI) in ["RO","CHD","PAR","RB","RL","RO","SIB","RU","SY"])
AND all(node IN nodes(p) WHERE labels(node)[0] IN allowedLabels)
RETURN p
这也假设这里的所有节点只有一个可能的标签,而不是多个。如果他们可以有多个标签,那么我们可能需要重组查询。