在 Neo4j 图形数据库中对复杂匹配进行评分时的性能?
Performance during scoring complex match in Neo4j graph database?
我有 Neo4j 3.3.5 图形数据库:27GB,50kk 节点,500kk 关系。索引上。 Schema。 PC:16GB 内存,4 核。
任务是为给定的查询数据找到最匹配的公司。
我需要获取的节点:Company 与 nodes:Branch、:Country 等有多种关系。
查询数据有BranchIds、CountryIds等
目前我正在使用这样的密码从一个关系中获得分数(结果为 500k 行):
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN
c.companyId as Id,
case r.branchType
when 0 then 25
... // //around 7 conditions per query
when 10 then 20
end as Score
我必须像这样对 :Company 的所有关系类型进行评分,按 Id
分组,求和 Score
,排序并取前 100 个结果。
由于缺少 post 联合处理,我使用 collect
+ unwind
来合并所有关系的分数。
不幸的是,性能很低。我在 5-10 秒内收到对一个关系(如上)查询的响应。当我尝试将结果与 collect
+ unwind
组合时,查询 "never" 结束。
better/proper 方法是什么?也许我在图形设计方面做错了什么?硬件配置低?或者图数据库中是否有一些算法可以匹配得分图(查询数据)?
更新
查询说明:
用户可以在我们的系统中搜索公司。对于他的查询,我们准备查询数据包含分支机构、国家、单词等的 ID。
在查询结果中,我们希望获得与分数最匹配的公司 ID 列表。
例如用户可以搜索西班牙生产木桌的新公司。
组合查询示例:
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "4" then collect({id:c.companyId, score: 20})
when "5" then collect({id:c.companyId, score: 15})
when "6" then collect({id:c.companyId, score: 6})
when "7" then collect({id:c.companyId, score: 5})
when "8" then collect({id:c.companyId, score: 4})
when "9" then collect({id:c.companyId, score: 4})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 30})
when "2" then collect({id:c.companyId, score: 15})
end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100
您可以试试这个查询,看看它是否更好:
MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
或者更好的是这个(但前提是 Company
节点必须链接到 Country
和 Branch
):
MATCH
(c:Company)-[r1:HAS_BRANCH]->(b:Branch),
(c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
(c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE
b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
让我们看看我们是否可以通过使用模式理解和 reduce() 函数来随着查询的进行更新每个公司的分数,以及等到最后预测出 id 属性:
MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore +
case type
when "0" then 30
when "2" then 15
end ) as score
//here I would add in future other relations scoring
WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score
我有 Neo4j 3.3.5 图形数据库:27GB,50kk 节点,500kk 关系。索引上。 Schema。 PC:16GB 内存,4 核。
任务是为给定的查询数据找到最匹配的公司。 我需要获取的节点:Company 与 nodes:Branch、:Country 等有多种关系。 查询数据有BranchIds、CountryIds等
目前我正在使用这样的密码从一个关系中获得分数(结果为 500k 行):
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN
c.companyId as Id,
case r.branchType
when 0 then 25
... // //around 7 conditions per query
when 10 then 20
end as Score
我必须像这样对 :Company 的所有关系类型进行评分,按 Id
分组,求和 Score
,排序并取前 100 个结果。
由于缺少 post 联合处理,我使用 collect
+ unwind
来合并所有关系的分数。
不幸的是,性能很低。我在 5-10 秒内收到对一个关系(如上)查询的响应。当我尝试将结果与 collect
+ unwind
组合时,查询 "never" 结束。
better/proper 方法是什么?也许我在图形设计方面做错了什么?硬件配置低?或者图数据库中是否有一些算法可以匹配得分图(查询数据)?
更新
查询说明:
用户可以在我们的系统中搜索公司。对于他的查询,我们准备查询数据包含分支机构、国家、单词等的 ID。 在查询结果中,我们希望获得与分数最匹配的公司 ID 列表。
例如用户可以搜索西班牙生产木桌的新公司。
组合查询示例:
MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "4" then collect({id:c.companyId, score: 20})
when "5" then collect({id:c.companyId, score: 15})
when "6" then collect({id:c.companyId, score: 6})
when "7" then collect({id:c.companyId, score: 5})
when "8" then collect({id:c.companyId, score: 4})
when "9" then collect({id:c.companyId, score: 4})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 25})
when "1" then collect({id:c.companyId, score: 19})
when "2" then collect({id:c.companyId, score: 20})
when "3" then collect({id:c.companyId, score: 19})
when "10" then collect({id:c.companyId, score: 20})
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH rows + case r.branchType
when "0" then collect({id:c.companyId, score: 30})
when "2" then collect({id:c.companyId, score: 15})
end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100
您可以试试这个查询,看看它是否更好:
MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"]
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
或者更好的是这个(但前提是 Company
节点必须链接到 Country
和 Branch
):
MATCH
(c:Company)-[r1:HAS_BRANCH]->(b:Branch),
(c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
(c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE
b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
cou.countryId in ["9580" , "18551" , "15895"]
WITH c,
case r1.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end as branchScore,
case r2.branchType
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end as revertedBranchScore,
case r3.branchType
when "0" then 30
when "2" then 15
end as countryScore
WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
让我们看看我们是否可以通过使用模式理解和 reduce() 函数来随着查询的进行更新每个公司的分数,以及等到最后预测出 id 属性:
MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "4" then 20
when "5" then 15
when "6" then 6
when "7" then 5
when "8" then 4
when "9" then 4
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore +
case type
when "0" then 25
when "1" then 19
when "2" then 20
when "3" then 19
when "10" then 20
end ) as score
WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore +
case type
when "0" then 30
when "2" then 15
end ) as score
//here I would add in future other relations scoring
WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score