"join" gremlin 中图形的两个节点的正确方法是什么？

Question

假设我的图表如下所示

graph = TinkerGraph.open()
g = graph.traversal()
v1 = g.addV('CC').property('name','t1')
v2 = g.addV('KK').property('name','t1')

我想找到与 KK 具有相同 'name' 的所有 CC。我可以写：

g.V().hasLabel('CC').as('c').values('name').as('cn').V().hasLabel('KK').values('name').as('k').where('cn',eq('k')).select('c')

这模仿了 SQL 中的连接，但是这样写性能似乎很差。从 SQL2Gremlin 开始，他们有 "join" 个两个节点的示例，如果两个节点之间有一条边连接的话。我想知道 gremlin 中是否有任何连接方法，是否有一条路径连接两个节点是事先未知的？换句话说，在 gremlin 中写 "join" 的最佳方法是什么，我们不知道这两个节点是直接连接还是通过路径连接？

非常感谢！

Answer 1

您的直觉基本上是正确的。 "join" 是两个顶点之间的已实现关系（即边）。这通常是图形数据库的好处。在 SQL 风格的属性上进行顶点到顶点的连接对于图形来说通常效率不高。

至于您的查询，您可以将其重写为此表单以便更清楚：

gremlin> g.V().hasLabel('CC').as('c').
......1>   V().hasLabel('KK').
......2>   where(eq('c')).
......3>     by('name').
......4>   select('c')
==>v[0]

但是性能可能会保持不变，因为我认为目前没有任何图形系统会优化此遍历。将不会使用索引，您将通过 "CC" 和 "KK" 的完整图形扫描来获取结果。显然，在大图上这是非常昂贵的。

在 Gremlin 用户邮件列表 here 中就此主题进行了一些讨论，其中提出了很多不错的观点。值得注意的是，Josh Perryman 写道（除其他优点外）：

A SQL-style of join is a very poor use of a graph db engine. Like Daniel suggests joins should be pre-computed and the edges added at write-time / data I jest time.

This is by necessity and design. Edges are basically materialized joins. Graph databases are optimized for them, a disk or cache read operation. Relational databases are optimized for joins, a query-time computer operation.

It is usually significantly cheaper to pre-compute the edges in a separate engine before loading data than to do so after data is loaded into the graph. The exception to this is when an edge is determined based on a multi-hop path through the graph. For that use case a graph dB is best.

"join" gremlin 中图形的两个节点的正确方法是什么？

What's the right approach to "join" two nodes of a graph in gremlin?

gremlin