使用 APOC 将 CSV 导入 neo4j 时处理空数组类型

Handling empty array types when using APOC to import a CSV to neo4j

我有一个 csv 文件,其中一些字段是数组类型。字段用 , 分隔,数组项用 ; 分隔。例如:

index, name, friends, neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,

其中 Jim 有三个朋友 JohnTimFred,以及三个邻居 SusanMegan,和Cheryl,而Sean没有朋友也没有邻居。

但是,当我使用 apoc.load.csv 将其读入 neo4j 时,我最终得到了其中包含空字符串(而不是空列表)的列表属性。例如:

CALL apoc.periodic.iterate("
CALL apoc.load.csv('file.csv',
    {header:true,sep:',',
    mapping:{
        friends:{array:true},
        neighbors:{array:true}}
    })
YIELD map as row RETURN row
","
CREATE (p:Person) SET p = row
", 
{batchsize:50000, iterateList:true, parallel:true});

给我一个 Person,名称为 Sean,但名称为 friends=[ "" ]neighbors=[ "" ]

我想要的是Seanfriends=[]neighbors=[]

谢谢!

  1. 确保您的 CSV 文件中没有多余的 space header(否则某些 属性 名称将以 space 开头):

    index,name,friends,neighbors
    0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
    1,Susan,Jim;John,Megan;Cheryl
    2,Sean,,,
    
  2. 使用list comprehension帮助消除所有friendsneighbors元素为空字符串:

    CALL apoc.periodic.iterate(
      "CALL apoc.load.csv(
         'file.csv',
         {
           header:true, sep:',',
           mapping: {
             friends: {array: true},
             neighbors: {array: true}
           }
         }) YIELD map
       RETURN map
      ",
      "CREATE (p:Person)
       SET p = map
       SET p.friends = [f IN p.friends WHERE f <> '']
       SET p.neighbors = [n IN p.neighbors WHERE n <> '']
      ", 
      {batchsize:50000, iterateList:true, parallel:true}
    );
    

经过以上修改,这个查询:

MATCH (person:Person) RETURN person;

returns 这个结果:

╒══════════════════════════════════════════════════════════════════════╕
│"person"                                                              │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Jim","index":"0","neighbors":["Susan","Megan","Cheryl"],"frie│
│nds":["John","Tim","Fred"]}                                           │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Susan","index":"1","neighbors":["Megan","Cheryl"],"friends":[│
│"Jim","John"]}                                                        │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Sean","index":"2","neighbors":[],"friends":[]}               │
└──────────────────────────────────────────────────────────────────────┘

[更新]

此外,如果您的 CSV 文件不可能包含 "empty" 朋友或邻居子字符串(例如 John;;Fred),则此版本的查询使用 CASE 而不是列表理解会更有效:

CALL apoc.periodic.iterate(
  "CALL apoc.load.csv(
     'file.csv',
     {
       header:true, sep:',',
       mapping: {
         friends: {array: true},
         neighbors: {array: true, arraySep:';'}
       }
     }) YIELD map
   RETURN map
  ",
  "CREATE (p:Person)
     SET p = map
     SET p.friends = CASE p.friends WHEN [''] THEN [] ELSE p.friends END
     SET p.neighbors = CASE p.neighbors WHEN [''] THEN [] ELSE p.neighbors END
  ", 
  {batchsize:50000, iterateList:true, parallel:true}
);