Apache beam write 转换写入多个文件?
Apache beam write transform writes into multiple files?
我正在查看 Apache Beam 中的 wordCount 示例
当我尝试在本地 运行 这个例子时,它把计数写入了多个文件。我创建了一个测试项目来从文件中读取和写入数据,甚至该写入操作也将输出写入多个文件。如何在单个文件中获得结果?我正在使用直接 运行ner
这是出于性能原因。您应该能够使用 TextIO.Write.withoutSharding
强制使用单个文件
withoutSharding
public TextIO.Write withoutSharding()
Forces a single file as output and empty shard name template. This
option is only compatible with unwindowed writes.
For unwindowed writes, constraining the number of shards is likely to
reduce the performance of a pipeline. Setting this value is not
recommended unless you require a specific number of output files.
This is equivalent to .withNumShards(1).withShardNameTemplate("")
我正在查看 Apache Beam 中的 wordCount 示例 当我尝试在本地 运行 这个例子时,它把计数写入了多个文件。我创建了一个测试项目来从文件中读取和写入数据,甚至该写入操作也将输出写入多个文件。如何在单个文件中获得结果?我正在使用直接 运行ner
这是出于性能原因。您应该能够使用 TextIO.Write.withoutSharding
强制使用单个文件withoutSharding
public TextIO.Write withoutSharding()
Forces a single file as output and empty shard name template. This option is only compatible with unwindowed writes.
For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
This is equivalent to .withNumShards(1).withShardNameTemplate("")