Neo4j 密码查询结果到 Pandas DataFrame

Neo4j cypher query results into Pandas DataFrame

我正在尝试读取具有节点 ID 以及它们之间各自关系的 csv 文件。前两列代表节点,第三列代表它们之间的关系。到目前为止,我能够在 neo4j 中创建数据库,但我不确定将所需数据提取到 pandas DataFrame!

中的密码查询是什么

我将在这里使用大型数据集的子集来说明我的问题。原始数据集包含数千个节点和关系。

我的 csv 文件(Node1_id、Node2_id、relation_id)看起来像这样:

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

这里是通过从 csv 文件加载 id 来创建节点并定义节点之间的关系。 (我想这张图是正确的,但如果您发现任何问题请告诉我) 我正在使用 csv 文件中的 ID 为节点和关系分配一个 属性“id”。

LOAD CSV WITH HEADERS FROM  'file:///edges.csv' AS row FIELDTERMINATOR ","
WITH row
WHERE row.relation_id = '1'
MERGE (paper:Paper{id:(row.Node1_id)})
MERGE (author:Author{id:(row.Node2_id)})
CREATE (paper)-[au:AUTHORED{id: '1'}]->(author);

到目前为止我已经尝试过这样的事情:

    query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper,author LIMIT 3; ''' 
    result = session.run(query)
    df = DataFrame(result)

    for dataF in df.itertuples(index=False):
    print(row)

它return是这样的:

0   1
0   (id)    (id)
1   (id)    (id)
2   (id)    (id)

想要的结果:

我希望通过从 graphDB 查询数据并逐行迭代结果,以节点 ID 和关系 ID 的格式将结果放入 pandas DataFrame 中,例如上面的 csv 中定义的结果。

0   1   1
4   2   1
44  3   1
0   4   1
0   5   1
4   10173   3
4   10191   2
4   10192   2
6   10193   2
8   10194   2
3   10195   2
6   10196   2

我也有兴趣知道密码查询对象的 return 类型是什么,在这种情况下它是 pandas.core.frame.DataFrame 但是我如何在密码期间访问节点和关系的独立属性询问。这是主要问题。

请随时详细解释,非常感谢您的帮助。

使用neo4j版本:4.2.1

我正在使用 py2neo,所以如果您使用的方式不同,您可以使用它或告诉我您使用的是哪个 neo4j 库,我将编辑我的答案。

#1:期望的结果

I want results into pandas DataFrame in the format with nodes ids and relation ids such as defined in csv above by querying the data from graphDB and iterate the results row by row.

 from py2neo import Graph 
 from pandas import DataFrame
 # remove search by au.id='1' and limit so that you will get all 
 # return the id in your query 
 session = Graph("bolt://localhost:7687", auth=("neo4j", "****"))
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper.id, author.id, au.id LIMIT 3; ''' 
 # access the result data
 result = session.run(query).data() 
 # convert result into pandas dataframe 
 df = DataFrame(result)
 df.head()

结果:

0   1   1
4   2   1
44  3   1

#2:另一个问题

how can I access the induvial properties of nodes and relation during the cypher query ANS: the properties inside nodes are dict so use the get function

 # Note that we are returning the nodes and not ids
 query = ''' MATCH (paper)-[au:AUTHORED{id: '1'}]->(author) RETURN paper, author, au LIMIT 3; ''' 
result = session.run(query).data() 
print ("What is data type of result? ", type(result))
print ("What is the data type of each item? ", type(result[0]))
print ("What are the keys of the dictionary? ", result[0].keys())
print ("What is the class of the node? ", type(result[0].get('paper')))
print ("How to access the first node? ", result[0].get('paper'))
print ("How to access values inside the node? ", result[0].get('paper',{}).get('id'))

Result:
What is data type of result?  <class 'list'>
What is the data type of each item?  <class 'dict'>
What are the keys of the dictionary?  dict_keys(['paper', 'author', 'au'])
What is the class of the node?  <class 'py2neo.data.Node'>
How to access the first node?  (_888:paper {id: '1'})
How to access values inside the node?  '1'