IOError: No files found based on the file pattern

IOError: No files found based on the file pattern

我正在尝试 运行 在 Python SDK 中找到的示例。但是,堆栈跟踪会出错,如下所示。注意:第一个管道确实创建了“./names”文件,但第二个管道似乎无法从中读取。

No handlers could be found for logger "oauth2client.contrib.multistore_file"
Traceback (most recent call last):
  File "example.py", line 17, in <module>
    | 'save' >> beam.io.WriteToText(greetings_file))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 391, in __init__
    skip_header_lines=skip_header_lines)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/textio.py", line 88, in __init__
    validate=validate)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 97, in __init__
    self._validate()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.py", line 173, in _validate
    'No files found based on the file pattern %s' % self._pattern)
IOError: No files found based on the file pattern ./names

示例代码如下:

import apache_beam as beam
def add_greeting(name, messages):
    for msg in messages:
        yield '%s %s' % (msg, name)

names_file = './names'
greetings_file = './greetings'

p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
   | 'save' >> beam.io.WriteToText(names_file))
p.run()

(p
 | 'load names' >> beam.io.ReadFromText(names_file)
 | 'add greetings' >> beam.FlatMap(add_greetings, ['Hello', 'Hola'])
 | 'save' >> beam.io.WriteToText(greetings_file))
p.run()

环境:我运行在 google 云 shell

上安装它
$ pip list --local --format=columns | grep dataflow
google-cloud-dataflow              0.6.0 

当管道 运行s 时,Beam 中的 运行ners 不会等待它完成,因此您应该在调用 [=12] 之后添加对 wait_until_finish() 的调用=].

此外,Beam 管道具有 延迟执行,因此当您为管道定义新步骤时,它们会添加到图表中,每次您 运行 你的流水线。这意味着,简而言之,如果您想要一个具有 运行 个不同步骤的管道,则需要创建一个新的 Pipeline 对象。

这应该有效:

p = beam.Pipeline('DirectRunner')
(p | 'add names' >> beam.Create(['Ann', 'Joe'])
   | 'save' >> beam.io.WriteToText('./names'))
p.run().wait_until_finish()

p = beam.Pipeline('DirectRunner')
(p
 | 'load names' >> beam.io.ReadFromText('./names*')
 | 'add greetings' >> beam.FlatMap(add_greeting, ['Hello', 'Hola'])
 | 'save' >> beam.io.WriteToText(greetings_file))
p.run().wait_until_finish()