Google Cloud DataFlow 作业在几个小时后抛出警报
Google Cloud DataFlow job throws alert after few hours
运行 使用 2.11.0 版本的 DataFlow 流作业。
几个小时后我收到以下身份验证错误:
File "streaming_twitter.py", line 188, in <lambda>
File "streaming_twitter.py", line 102, in estimate
File "streaming_twitter.py", line 84, in estimate_aiplatform
File "streaming_twitter.py", line 42, in get_service
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials()
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default()
File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application.
此 Dataflow 作业对 AI Platform 预测执行 API 请求
并且似乎是身份验证令牌即将过期。
代码片段:
def get_service():
# If it hasn't been instantiated yet: do it now
return discovery.build('ml', 'v1',
discoveryServiceUrl=DISCOVERY_SERVICE,
cache_discovery=True)
我尝试将以下几行添加到服务函数中:
os.environ[
"GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"
但我得到:
DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']
我假设是因为文件不在 DataFlow 机器中。
其他选项是在构建方法中使用 developerKey
参数,但 AI Platform 预测似乎不支持,我收到错误:
Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']
想要了解如何修复它以及最佳做法是什么?
有什么建议吗?
设置os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json'
仅适用于本地 DirectRunner。一旦部署到像 Dataflow 这样的分布式运行器,每个工作人员将无法找到 local 文件 /tmp/key.json
.
如果您希望每个工作人员使用特定的服务帐户,您可以告诉 Beam 使用哪个服务帐户来识别工作人员。
首先,grant the roles/dataflow.worker
role to the service account 您希望您的员工使用。无需下载服务帐户密钥文件:)
然后,如果您让 PipelineOptions
解析您的命令行参数,您可以简单地使用 service_account_email
option,并在 运行 您的管道时像 --service_account_email your-email@your-project.iam.gserviceaccount.com
那样指定它.
您的 GOOGLE_APPLICATION_CREDENTIALS
指向的服务帐户仅用于 启动 作业,但每个工作人员都使用 service_account_email
指定的服务帐户。如果未传递 service_account_email
,则默认为来自您的 GOOGLE_APPLICATION_CREDENTIALS
文件的电子邮件。
运行 使用 2.11.0 版本的 DataFlow 流作业。 几个小时后我收到以下身份验证错误:
File "streaming_twitter.py", line 188, in <lambda>
File "streaming_twitter.py", line 102, in estimate
File "streaming_twitter.py", line 84, in estimate_aiplatform
File "streaming_twitter.py", line 42, in get_service
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 227, in build credentials=credentials)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery.py", line 363, in build_from_document credentials = _auth.default_credentials()
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/_auth.py", line 42, in default_credentials credentials, _ = google.auth.default()
File "/usr/local/lib/python2.7/dist-packages/google/auth/_default.py", line 306, in default raise exceptions.DefaultCredentialsError(_HELP_MESSAGE) DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application.
此 Dataflow 作业对 AI Platform 预测执行 API 请求 并且似乎是身份验证令牌即将过期。
代码片段:
def get_service():
# If it hasn't been instantiated yet: do it now
return discovery.build('ml', 'v1',
discoveryServiceUrl=DISCOVERY_SERVICE,
cache_discovery=True)
我尝试将以下几行添加到服务函数中:
os.environ[
"GOOGLE_APPLICATION_CREDENTIALS"] = "/tmp/key.json"
但我得到:
DefaultCredentialsError: File "/tmp/key.json" was not found. [while running 'generatedPtransform-930']
我假设是因为文件不在 DataFlow 机器中。
其他选项是在构建方法中使用 developerKey
参数,但 AI Platform 预测似乎不支持,我收到错误:
Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."> [while running 'generatedPtransform-22624']
想要了解如何修复它以及最佳做法是什么?
有什么建议吗?
设置os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/tmp/key.json'
仅适用于本地 DirectRunner。一旦部署到像 Dataflow 这样的分布式运行器,每个工作人员将无法找到 local 文件 /tmp/key.json
.
如果您希望每个工作人员使用特定的服务帐户,您可以告诉 Beam 使用哪个服务帐户来识别工作人员。
首先,grant the roles/dataflow.worker
role to the service account 您希望您的员工使用。无需下载服务帐户密钥文件:)
然后,如果您让 PipelineOptions
解析您的命令行参数,您可以简单地使用 service_account_email
option,并在 运行 您的管道时像 --service_account_email your-email@your-project.iam.gserviceaccount.com
那样指定它.
您的 GOOGLE_APPLICATION_CREDENTIALS
指向的服务帐户仅用于 启动 作业,但每个工作人员都使用 service_account_email
指定的服务帐户。如果未传递 service_account_email
,则默认为来自您的 GOOGLE_APPLICATION_CREDENTIALS
文件的电子邮件。