Google Cloud DataFlow 作业在几个小时后抛出警报

Google Cloud DataFlow job throws alert after few hours

运行 使用 2.11.0 版本的 DataFlow 流作业。 几个小时后我收到以下身份验证错误:

File "streaming_twitter.py", line 188, in <lambda> 
File "streaming_twitter.py", line 102, in estimate 
File "streaming_twitter.py", line 84, in estimate_aiplatform 
File "streaming_twitter.py", line 42, in get_service 
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs) 
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials) 
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs) 
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials() 
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default() 
File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. 

此 Dataflow 作业对 AI Platform 预测执行 API 请求 并且似乎是身份验证令牌即将过期。

代码片段:

def get_service():
    # If it hasn't been instantiated yet: do it now
    return discovery.build('ml', 'v1',
                           discoveryServiceUrl=DISCOVERY_SERVICE,
                           cache_discovery=True)

我尝试将以下几行添加到服务函数中:

    os.environ[
        "GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"

但我得到:

DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']

我假设是因为文件不在 DataFlow 机器中。 其他选项是在构建方法中使用 developerKey 参数,但 AI Platform 预测似乎不支持,我收到错误:

Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']

想要了解如何修复它以及最佳做法是什么?

有什么建议吗?

设置os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json' 仅适用于本地 DirectRunner。一旦部署到像 Dataflow 这样的分布式运行器,每个工作人员将无法找到 local 文件 /tmp/key.json.

如果您希望每个工作人员使用特定的服务帐户,您可以告诉 Beam 使用哪个服务帐户来识别工作人员。

首先,grant the roles/dataflow.worker role to the service account 您希望您的员工使用。无需下载服务帐户密钥文件:)

然后,如果您让 PipelineOptions 解析您的命令行参数,您可以简单地使用 service_account_email option,并在 运行 您的管道时像 --service_account_email your-email@your-project.iam.gserviceaccount.com 那样指定它.

您的 GOOGLE_APPLICATION_CREDENTIALS 指向的服务帐户仅用于 启动 作业,但每个工作人员都使用 service_account_email 指定的服务帐户。如果未传递 service_account_email,则默认为来自您的 GOOGLE_APPLICATION_CREDENTIALS 文件的电子邮件。