输出空值的 TableRow 时出现 NullPointerException
NullPointerException when outputting TableRow with null value
我正在尝试构建一个 TableRow
对象以最终写入 BigQuery table,但如果我在其中包含一个 null
值,我会得到一个 NullPointerException
该行。这是完整的堆栈跟踪:
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at dataflowsandbox.StarterPipeline.runTest(StarterPipeline.java:224)
at dataflowsandbox.StarterPipeline.main(StarterPipeline.java:83)
Caused by: java.lang.NullPointerException
at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
at java.util.AbstractMap.hashCode(AbstractMap.java:530)
at java.util.Arrays.hashCode(Arrays.java:4146)
at java.util.Objects.hash(Objects.java:128)
at org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow.hashCode(WindowedValue.java:245)
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.get(HashMap.java:557)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:191)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.HashMultimap.put(HashMultimap.java:48)
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:111)
at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:242)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access0(SimpleDoFnRunner.java:69)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
at dataflowsandbox.StarterPipeline.procesElement(StarterPipeline.java:202)
Process finished with exit code 1
这是触发 NullPointerException
:
的代码
Pipeline p = Pipeline.create( options );
p.apply( "kicker", Create.of( "Kick!" ) )
.apply( "Read values", ParDo.of( new DoFn<String, TableRow>() {
@ProcessElement
public void procesElement( ProcessContext c ) {
TableRow row = new TableRow();
row.set( "ev_id", "2323423423" );
row.set( "customer_id", "111111" );
row.set( "org_id", null ); // Without this line, no NPE
c.output( row );
} }) )
.apply( BigQueryIO.writeTableRows()
.to( DATA_TABLE_OUT )
.withCreateDisposition( CREATE_NEVER )
.withWriteDisposition( WRITE_APPEND ) );
PipelineResult result = p.run();
我的实际代码有点复杂,但我应该能够捕获 null
值,只是不在行中设置它,但也许我不了解 TableRows
.
输入一个临时值而不是 null 或空字符串。据我所知,tablerows 不接受空值。
例如,您可以提供 table 架构,而忽略设置字段的值。
table 架构,其中 org_id
是 NULLABLE
:
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("ev_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("customer_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("org_id").setType("STRING").setMode("NULLABLE"));
TableSchema schema = new TableSchema().setFields(fields);
不要为该字段设置任何值(注释掉该行):
row.set( "ev_id", "2323423423" );
row.set( "customer_id", "111111" );
// row.set( "org_id", None ); // Without this line, no NPE
c.output( row );
在写入步骤中传递 table 架构:
.apply( BigQueryIO.writeTableRows()
.to( DATA_TABLE_OUT )
.withSchema(schema)
.withCreateDisposition( CREATE_NEVER )
.withWriteDisposition( WRITE_APPEND ) );
一个 NULL
值将写入 BigQuery:
如果您使用的是 DirectRunner,请使用参数 --enforceImmutability=false。它对我有用。 Dataflow Runner 已解决此问题,但在使用 DirectRunner 时,如果将 null 传递给 tableRow.set(),我们会遇到 NPE。如果我们通过设置 --enforceImmutability=false 管道选项关闭 DirectRunner 的 ImmutabilityEnforcement 检查,错误就不会再出现了。
我正在尝试构建一个 TableRow
对象以最终写入 BigQuery table,但如果我在其中包含一个 null
值,我会得到一个 NullPointerException
该行。这是完整的堆栈跟踪:
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at dataflowsandbox.StarterPipeline.runTest(StarterPipeline.java:224)
at dataflowsandbox.StarterPipeline.main(StarterPipeline.java:83)
Caused by: java.lang.NullPointerException
at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
at java.util.AbstractMap.hashCode(AbstractMap.java:530)
at java.util.Arrays.hashCode(Arrays.java:4146)
at java.util.Objects.hash(Objects.java:128)
at org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow.hashCode(WindowedValue.java:245)
at java.util.HashMap.hash(HashMap.java:339)
at java.util.HashMap.get(HashMap.java:557)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:191)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.HashMultimap.put(HashMultimap.java:48)
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:111)
at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:242)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access0(SimpleDoFnRunner.java:69)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
at dataflowsandbox.StarterPipeline.procesElement(StarterPipeline.java:202)
Process finished with exit code 1
这是触发 NullPointerException
:
Pipeline p = Pipeline.create( options );
p.apply( "kicker", Create.of( "Kick!" ) )
.apply( "Read values", ParDo.of( new DoFn<String, TableRow>() {
@ProcessElement
public void procesElement( ProcessContext c ) {
TableRow row = new TableRow();
row.set( "ev_id", "2323423423" );
row.set( "customer_id", "111111" );
row.set( "org_id", null ); // Without this line, no NPE
c.output( row );
} }) )
.apply( BigQueryIO.writeTableRows()
.to( DATA_TABLE_OUT )
.withCreateDisposition( CREATE_NEVER )
.withWriteDisposition( WRITE_APPEND ) );
PipelineResult result = p.run();
我的实际代码有点复杂,但我应该能够捕获 null
值,只是不在行中设置它,但也许我不了解 TableRows
.
输入一个临时值而不是 null 或空字符串。据我所知,tablerows 不接受空值。
例如,您可以提供 table 架构,而忽略设置字段的值。
table 架构,其中 org_id
是 NULLABLE
:
List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("ev_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("customer_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("org_id").setType("STRING").setMode("NULLABLE"));
TableSchema schema = new TableSchema().setFields(fields);
不要为该字段设置任何值(注释掉该行):
row.set( "ev_id", "2323423423" );
row.set( "customer_id", "111111" );
// row.set( "org_id", None ); // Without this line, no NPE
c.output( row );
在写入步骤中传递 table 架构:
.apply( BigQueryIO.writeTableRows()
.to( DATA_TABLE_OUT )
.withSchema(schema)
.withCreateDisposition( CREATE_NEVER )
.withWriteDisposition( WRITE_APPEND ) );
一个 NULL
值将写入 BigQuery:
如果您使用的是 DirectRunner,请使用参数 --enforceImmutability=false。它对我有用。 Dataflow Runner 已解决此问题,但在使用 DirectRunner 时,如果将 null 传递给 tableRow.set(),我们会遇到 NPE。如果我们通过设置 --enforceImmutability=false 管道选项关闭 DirectRunner 的 ImmutabilityEnforcement 检查,错误就不会再出现了。