IsmSinkWriter 期望密钥以严格递增的顺序写入
IsmSinkWriter expects keys to be written in strictly increasing order
我相信 View.asSingleton()
我的 DF 作业失败了,阶段失败了 4 次,因此整个作业都失败了:
(d373a0bb7c7bad6f): java.lang.IllegalArgumentException: IsmSinkWriter expects keys to be written in strictly increasing order but was given RandomAccessData{buffer=[], size=0} as the previous key and RandomAccessData{buffer=[], size=0} as the current key. Expected 0 <= 0 at position 1. at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.commonPrefixLengthWithOrderCheck(IsmSink.java:209) at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.add(IsmSink.java:166) at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.add(IsmSink.java:85) at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.process(WriteOperation.java:90) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.output(SimpleParDoFn.java:161) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:450) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner$BatchViewAsSingleton$IsmRecordForSingularValuePerWindowDoFn.processElement(DataflowPipelineRunner.java:825)
我正在尝试从 PCollection[CMS[String]]
创建一个 PCollectionView
- 集合中只有一个元素(其大小约为 3.75MiB
)。
请帮忙?
更新 1:当我将视图的单个元素的大小时减小到 1.88 MB
但成功 255.29 KB
(和更小)时应用程序失败 - 闻起来有点像一些(un )记录在案的限制是我错过了还是错误?
版本 1.5.0 和 1.5.1 现已修复此问题。
1.5.0 和 1.5.1 的批处理模式下的全局窗口单例受到一个错误的影响,在该错误中它们无法实现大小超过 1MB 的单例。建议用户使用 View.asIterable() 或 View.asList() 作为解决方法,因为它没有受到影响。
我相信 View.asSingleton()
我的 DF 作业失败了,阶段失败了 4 次,因此整个作业都失败了:
(d373a0bb7c7bad6f): java.lang.IllegalArgumentException: IsmSinkWriter expects keys to be written in strictly increasing order but was given RandomAccessData{buffer=[], size=0} as the previous key and RandomAccessData{buffer=[], size=0} as the current key. Expected 0 <= 0 at position 1. at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.commonPrefixLengthWithOrderCheck(IsmSink.java:209) at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.add(IsmSink.java:166) at com.google.cloud.dataflow.sdk.runners.worker.IsmSink$IsmSinkWriter.add(IsmSink.java:85) at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.process(WriteOperation.java:90) at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52) at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.output(SimpleParDoFn.java:161) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288) at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext.output(DoFnRunnerBase.java:450) at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner$BatchViewAsSingleton$IsmRecordForSingularValuePerWindowDoFn.processElement(DataflowPipelineRunner.java:825)
我正在尝试从 PCollection[CMS[String]]
创建一个 PCollectionView
- 集合中只有一个元素(其大小约为 3.75MiB
)。
请帮忙?
更新 1:当我将视图的单个元素的大小时减小到 1.88 MB
但成功 255.29 KB
(和更小)时应用程序失败 - 闻起来有点像一些(un )记录在案的限制是我错过了还是错误?
版本 1.5.0 和 1.5.1 现已修复此问题。
1.5.0 和 1.5.1 的批处理模式下的全局窗口单例受到一个错误的影响,在该错误中它们无法实现大小超过 1MB 的单例。建议用户使用 View.asIterable() 或 View.asList() 作为解决方法,因为它没有受到影响。