导入关系阶段出了什么问题?
What's going wrong with the phase of importing relationships?
我终于攻克了导入节点的阶段。现在我正在尝试导入关系。可能有1B关系。
#!/bin/bash
cd /home/luning/neo4j-enterprise-2.2.0-RC01-unix/neo4j-enterprise-2.2.0-RC01/bin
users="/data/weibo/user-header.csv"
for i in /data/weibo/users/*
do
users=$users,$i
done
edges=/data/weibo/edge-header.csv,/data/weibo/ego/000000_0
./neo4j-import --stacktrace --into ../data/weibo_bak.db --nodes:User $users --relationships:Follow $edges --delimiter TAB --quote \' --bad-tolerance 50000 --id-type STRING
但是总是说节点丢失。令人费解的是,通过为两次试验导入相同的文件,它给了我不同的缺失节点。
1. 第一次
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:56)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
2。第二次
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:59)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
但是对于这两个节点 1587438071 和 2765561213,我可以确定它们在我的文件中。因为我能找到它们。
[luning@pinnacle data]$ grep 1587438071 /data/weibo/users/*
/data/weibo/users/000024_0:1587438071 琬童沛胜 浙江 杭州 http://tp4.sinaimg.cn/1587438071/50/40024579617/0 f 147 60 272 false LV2 31 一举成名| 正常 80 2014-02-17 04:17:38
[luning@pinnacle data]$ grep 1589699375 /data/weibo/users/*
/data/weibo/users/000010_0:1589699375 在行动Isabella 吉林 http://tp4.sinaimg.cn/1589699375/50/5633181098/0 女 297 438 4729 1981-01-17 false LV7 2014-08-13 21:43:34 2014-01-28 10:18:52
那么,谁能想出它是如何发生的?
可能是您的节点输入文件包含未正确关闭引号的字段,其中一些行 "eaten" 由其他行组成,实际上不会导入这些节点(如果对齐这些字段会碰巧像那样结束,否则会抛出异常)。也可能是解析器在面对这些汉字时出了问题。
您是否有机会与我(解析器和导入工具的主要作者)分享您的输入数据以供调查?
我终于攻克了导入节点的阶段。现在我正在尝试导入关系。可能有1B关系。
#!/bin/bash
cd /home/luning/neo4j-enterprise-2.2.0-RC01-unix/neo4j-enterprise-2.2.0-RC01/bin
users="/data/weibo/user-header.csv"
for i in /data/weibo/users/*
do
users=$users,$i
done
edges=/data/weibo/edge-header.csv,/data/weibo/ego/000000_0
./neo4j-import --stacktrace --into ../data/weibo_bak.db --nodes:User $users --relationships:Follow $edges --delimiter TAB --quote \' --bad-tolerance 50000 --id-type STRING
但是总是说节点丢失。令人费解的是,通过为两次试验导入相同的文件,它给了我不同的缺失节点。 1. 第一次
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1807199
startNode: 1587438071
endNode: 2414878813
type: Follow
refering to missing node 1587438071
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:56)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
2。第二次
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
java.lang.RuntimeException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.staging.StageExecution.stillExecuting(StageExecution.java:63)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.anyStillExecuting(ExecutionSupervisor.java:79)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.finishAwareSleep(ExecutionSupervisor.java:102)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisor.supervise(ExecutionSupervisor.java:64)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutionSupervisors.superviseDynamicExecution(ExecutionSupervisors.java:65)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.executeStages(ParallelBatchImporter.java:226)
at org.neo4j.unsafe.impl.batchimport.ParallelBatchImporter.doImport(ParallelBatchImporter.java:152)
at org.neo4j.tooling.ImportTool.main(ImportTool.java:263)
Caused by: org.neo4j.unsafe.impl.batchimport.input.InputException: Too many bad entries, saw 50001 where last one was InputRelationship:
source: /data/weibo/ego/000000_0:1844245
startNode: 3492922617
endNode: 1589699375
type: Follow
refering to missing node 1589699375
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:47)
at org.neo4j.unsafe.impl.batchimport.input.BadRelationshipsCollector.collect(BadRelationshipsCollector.java:27)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.incrementCount(CalculateDenseNodesStep.java:79)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:59)
at org.neo4j.unsafe.impl.batchimport.CalculateDenseNodesStep.process(CalculateDenseNodesStep.java:32)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:96)
at org.neo4j.unsafe.impl.batchimport.staging.ExecutorServiceStep.call(ExecutorServiceStep.java:87)
at org.neo4j.unsafe.impl.batchimport.executor.DynamicTaskExecutor$Processor.run(DynamicTaskExecutor.java:217)
但是对于这两个节点 1587438071 和 2765561213,我可以确定它们在我的文件中。因为我能找到它们。
[luning@pinnacle data]$ grep 1587438071 /data/weibo/users/*
/data/weibo/users/000024_0:1587438071 琬童沛胜 浙江 杭州 http://tp4.sinaimg.cn/1587438071/50/40024579617/0 f 147 60 272 false LV2 31 一举成名| 正常 80 2014-02-17 04:17:38
[luning@pinnacle data]$ grep 1589699375 /data/weibo/users/*
/data/weibo/users/000010_0:1589699375 在行动Isabella 吉林 http://tp4.sinaimg.cn/1589699375/50/5633181098/0 女 297 438 4729 1981-01-17 false LV7 2014-08-13 21:43:34 2014-01-28 10:18:52
那么,谁能想出它是如何发生的?
可能是您的节点输入文件包含未正确关闭引号的字段,其中一些行 "eaten" 由其他行组成,实际上不会导入这些节点(如果对齐这些字段会碰巧像那样结束,否则会抛出异常)。也可能是解析器在面对这些汉字时出了问题。
您是否有机会与我(解析器和导入工具的主要作者)分享您的输入数据以供调查?