导入关系阶段出了什么问题?

What's going wrong with the phase of importing relationships?

我终于攻克了导入节点的阶段。现在我正在尝试导入关系。可能有1B关系。

#!/bin/bash
cd /home/luning/neo4j-enterprise-2.2.0-RC01-unix/neo4j-enterprise-2.2.0-RC01/bin
users="/data/weibo/user-header.csv"
for i in /data/weibo/users/*
do
    users=$users,$i
done
edges=/data/weibo/edge-header.csv,/data/weibo/ego/000000_0
./neo4j-import --stacktrace --into ../data/weibo_bak.db --nodes:User $users --relationships:Follow $edges --delimiter TAB --quote \' --bad-tolerance 50000 --id-type STRING

但是总是说节点丢失。令人费解的是,通过为两次试验导入相同的文件,它给了我不同的缺失节点。 1. 第一次

   source: /data/weibo/ego/000000_0:1807199
   startNode: 1587438071
   endNode: 2414878813
   type: Follow
 refering to missing node 1587438071
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
   source: /data/weibo/ego/000000_0:1807199
   startNode: 1587438071
   endNode: 2414878813
   type: Follow
 refering to missing node 1587438071
    at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
    at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
   source: /data/weibo/ego/000000_0:1807199
   startNode: 1587438071
   endNode: 2414878813
   type: Follow
 refering to missing node 1587438071
    at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
    at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:56)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
    at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)

2。第二次

source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
 refering to missing node 1589699375
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
   source: /data/weibo/ego/000000_0:1844245
   startNode: 3492922617
   endNode: 1589699375
   type: Follow
 refering to missing node 1589699375
    at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
    at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
    at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
   source: /data/weibo/ego/000000_0:1844245
   startNode: 3492922617
   endNode: 1589699375
   type: Follow
 refering to missing node 1589699375
    at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
    at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:59)
    at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
    at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
    at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)

但是对于这两个节点 1587438071 和 2765561213,我可以确定它们在我的文件中。因为我能找到它们。

[luning@pinnacle data]$ grep 1587438071 /data/weibo/users/*
/data/weibo/users/000024_0:1587438071   琬童沛胜    浙江 杭州           http://tp4.sinaimg.cn/1587438071/50/40024579617/0   f   147 60  272     false       LV2 31  一举成名|   正常  80                      2014-02-17 04:17:38


[luning@pinnacle data]$ grep 1589699375 /data/weibo/users/*
/data/weibo/users/000010_0:1589699375   在行动Isabella 吉林          http://tp4.sinaimg.cn/1589699375/50/5633181098/0    女   297 438 4729    1981-01-17  false       LV7            2014-08-13 21:43:34                      2014-01-28 10:18:52

那么,谁能想出它是如何发生的?

可能是您的节点输入文件包含未正确关闭引号的字段,其中一些行 "eaten" 由其他行组成,实际上不会导入这些节点(如果对齐这些字段会碰巧像那样结束,否则会抛出异常)。也可能是解析器在面对这些汉字时出了问题。

您是否有机会与我(解析器和导入工具的主要作者)分享您的输入数据以供调查?