在 Neo4j 图形数据库中对复杂匹配进行评分时的性能？

Question

我有 Neo4j 3.3.5 图形数据库：27GB，50kk 节点，500kk 关系。索引上。 Schema。 PC：16GB 内存，4 核。

任务是为给定的查询数据找到最匹配的公司。我需要获取的节点：Company 与 nodes:Branch、：Country 等有多种关系。查询数据有BranchIds、CountryIds等

目前我正在使用这样的密码从一个关系中获得分数（结果为 500k 行）：

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN 
c.companyId as Id, 
case r.branchType 
 when 0 then 25
 ... // //around 7 conditions per query 
 when 10 then 20 
end as Score

我必须像这样对 :Company 的所有关系类型进行评分，按 Id 分组，求和 Score，排序并取前 100 个结果。

由于缺少 post 联合处理，我使用 collect + unwind 来合并所有关系的分数。

不幸的是，性能很低。我在 5-10 秒内收到对一个关系（如上）查询的响应。当我尝试将结果与 collect + unwind 组合时，查询 "never" 结束。

better/proper 方法是什么？也许我在图形设计方面做错了什么？硬件配置低？或者图数据库中是否有一些算法可以匹配得分图（查询数据）？

更新

查询说明：

用户可以在我们的系统中搜索公司。对于他的查询，我们准备查询数据包含分支机构、国家、单词等的 ID。在查询结果中，我们希望获得与分数最匹配的公司 ID 列表。

例如用户可以搜索西班牙生产木桌的新公司。

组合查询示例：

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "4" then collect({id:c.companyId, score: 20}) 
 when "5" then collect({id:c.companyId, score: 15}) 
 when "6" then collect({id:c.companyId, score: 6}) 
 when "7" then collect({id:c.companyId, score: 5}) 
 when "8" then collect({id:c.companyId, score: 4}) 
 when "9" then collect({id:c.companyId, score: 4}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 30}) 
 when "2" then collect({id:c.companyId, score: 15}) 
 end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100

Answer 1

您可以试试这个查询，看看它是否更好：

MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100

或者更好的是这个（但前提是 Company 节点必须链接到 Country 和 Branch）：

MATCH 
  (c:Company)-[r1:HAS_BRANCH]->(b:Branch),
  (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
  (c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE 
  b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND 
  c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
  cou.countryId in ["9580" , "18551" , "15895"]
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100

Answer 2

让我们看看我们是否可以通过使用模式理解和 reduce() 函数来随着查询的进行更新每个公司的分数，以及等到最后预测出 id 属性:

MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch) 
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore + 
 case type 
 when "0" then 25
 when "1" then 19
 when "2" then 20 
 when "3" then 19 
 when "4" then 20 
 when "5" then 15 
 when "6" then 6 
 when "7" then 5 
 when "8" then 4 
 when "9" then 4 
 when "10" then 20 
 end ) as score

WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore + 
 case type
 when "0" then 25
 when "1" then 19 
 when "2" then 20 
 when "3" then 19 
 when "10" then 20 
end ) as score

WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
 WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore + 
 case type
 when "0" then 30 
 when "2" then 15 
 end ) as score
 //here I would add in future other relations scoring

WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score

在 Neo4j 图形数据库中对复杂匹配进行评分时的性能？

Performance during scoring complex match in Neo4j graph database?

neo4j

graph-databases

cypher