如何访问 apache_beam.io.fileio.ReadableFile() 对象?
How does one access an apache_beam.io.fileio.ReadableFile() object?
我正在尝试使用 apache_beam.io.fileio
模块来读取文件 lines.txt
并将其合并到我的管道中。
lines.txt
内容如下:
line1
line2
line3
当我运行以下管道代码时:
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)
我得到以下输出:
<apache_beam.io.fileio.ReadableFile object at 0x000001A8C6C55F08>
我预计
line1
line2
line3
我怎样才能得到预期的结果?
的结果 PCollection
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
是一个 ReadableFile
对象。为了访问此对象,我们可以使用 apache beam pydoc.
中记录的各种函数
下面我们实现read_utf8()
:
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
| beam.Map(lambda file: file.read_utf8())
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)
我们得到了预期的结果:
line1
line2
line3
我正在尝试使用 apache_beam.io.fileio
模块来读取文件 lines.txt
并将其合并到我的管道中。
lines.txt
内容如下:
line1
line2
line3
当我运行以下管道代码时:
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)
我得到以下输出:
<apache_beam.io.fileio.ReadableFile object at 0x000001A8C6C55F08>
我预计
line1
line2
line3
我怎样才能得到预期的结果?
PCollection
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
是一个 ReadableFile
对象。为了访问此对象,我们可以使用 apache beam pydoc.
下面我们实现read_utf8()
:
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
| beam.Map(lambda file: file.read_utf8())
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)
我们得到了预期的结果:
line1
line2
line3