如何知道 dataproc 初始化操作何时完成
How to know when dataproc initialization actions are done
我需要 运行 一个 Dataproc
集群,同时安装了 BigQuery 和 Cloud Storage 连接器。
我使用了 this script 的变体(因为我无权访问一般使用的存储桶),一切正常,但是当我 运行 工作时,集群是up 和 运行ning,它总是导致 Task was not acquired
错误。
我可以通过简单地在每个节点上重新启动 dataproc 代理来解决这个问题,但我真的需要它正常工作才能在我的集群创建后立即 运行 一个作业。似乎这部分脚本无法正常工作:
# Restarts Dataproc Agent after successful initialization
# WARNING: this function relies on undocumented and not officially supported Dataproc Agent
# "sentinel" files to determine successful Agent initialization and not guaranteed
# to work in the future. Use at your own risk!
restart_dataproc_agent() {
# Because Dataproc Agent should be restarted after initialization, we need to wait until
# it will create a sentinel file that signals initialization competition (success or failure)
while [[ ! -f /var/lib/google/dataproc/has_run_before ]]; do
sleep 1
done
# If Dataproc Agent didn't create a sentinel file that signals initialization
# failure then it means that initialization succeded and it should be restarted
if [[ ! -f /var/lib/google/dataproc/has_failed_before ]]; then
service google-dataproc-agent restart
fi
}
export -f restart_dataproc_agent
# Schedule asynchronous Dataproc Agent restart so it will use updated connectors.
# It could not be restarted sycnhronously because Dataproc Agent should be restarted
# after its initialization, including init actions execution, has been completed.
bash -c restart_dataproc_agent & disown
我的问题是:
- 如何知道初始化动作已经完成?
- 我是否have/How 正确重启新创建的集群节点上的 Dataproc 代理?
编辑:
这是我用来创建集群的命令(使用 1.3 镜像版本):
gcloud dataproc --region europe-west1 \
clusters create my-cluster \
--bucket my-bucket \
--subnet default \
--zone europe-west1-b \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 50 \
--num-workers 2 \
--worker-machine-type n1-standard-2 \
--worker-boot-disk-size 100 \
--image-version 1.3 \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project my-project \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh \
--metadata 'gcs-connector-version=1.9.6' \
--metadata 'bigquery-connector-version=0.13.6'
另外,请注意,连接器初始化脚本已经修复并且现在可以正常工作,所以我现在正在使用它,但我仍然必须手动重新启动 dataproc 代理才能 运行 工作.
Dataproc 代理在初始化操作成功后在 /var/log/google-dataproc-agent.0.log
文件中记录 Custom initialization actions finished.
消息。
不,您不需要手动重启 Dataproc 代理。
此问题是由 Dataproc agent service restart in the connectors initialization action and should be resolved by this PR 引起的。
至于什么时候初始化动作完成,可以查看dataproc的status.state
,如果是CREATING
说明还在执行初始化动作,如果是RUNNING
就意味着他们完成了!
检查 here
我需要 运行 一个 Dataproc
集群,同时安装了 BigQuery 和 Cloud Storage 连接器。
我使用了 this script 的变体(因为我无权访问一般使用的存储桶),一切正常,但是当我 运行 工作时,集群是up 和 运行ning,它总是导致 Task was not acquired
错误。
我可以通过简单地在每个节点上重新启动 dataproc 代理来解决这个问题,但我真的需要它正常工作才能在我的集群创建后立即 运行 一个作业。似乎这部分脚本无法正常工作:
# Restarts Dataproc Agent after successful initialization
# WARNING: this function relies on undocumented and not officially supported Dataproc Agent
# "sentinel" files to determine successful Agent initialization and not guaranteed
# to work in the future. Use at your own risk!
restart_dataproc_agent() {
# Because Dataproc Agent should be restarted after initialization, we need to wait until
# it will create a sentinel file that signals initialization competition (success or failure)
while [[ ! -f /var/lib/google/dataproc/has_run_before ]]; do
sleep 1
done
# If Dataproc Agent didn't create a sentinel file that signals initialization
# failure then it means that initialization succeded and it should be restarted
if [[ ! -f /var/lib/google/dataproc/has_failed_before ]]; then
service google-dataproc-agent restart
fi
}
export -f restart_dataproc_agent
# Schedule asynchronous Dataproc Agent restart so it will use updated connectors.
# It could not be restarted sycnhronously because Dataproc Agent should be restarted
# after its initialization, including init actions execution, has been completed.
bash -c restart_dataproc_agent & disown
我的问题是:
- 如何知道初始化动作已经完成?
- 我是否have/How 正确重启新创建的集群节点上的 Dataproc 代理?
编辑: 这是我用来创建集群的命令(使用 1.3 镜像版本):
gcloud dataproc --region europe-west1 \
clusters create my-cluster \
--bucket my-bucket \
--subnet default \
--zone europe-west1-b \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 50 \
--num-workers 2 \
--worker-machine-type n1-standard-2 \
--worker-boot-disk-size 100 \
--image-version 1.3 \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project my-project \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh \
--metadata 'gcs-connector-version=1.9.6' \
--metadata 'bigquery-connector-version=0.13.6'
另外,请注意,连接器初始化脚本已经修复并且现在可以正常工作,所以我现在正在使用它,但我仍然必须手动重新启动 dataproc 代理才能 运行 工作.
Dataproc 代理在初始化操作成功后在
/var/log/google-dataproc-agent.0.log
文件中记录Custom initialization actions finished.
消息。不,您不需要手动重启 Dataproc 代理。
此问题是由 Dataproc agent service restart in the connectors initialization action and should be resolved by this PR 引起的。
至于什么时候初始化动作完成,可以查看dataproc的status.state
,如果是CREATING
说明还在执行初始化动作,如果是RUNNING
就意味着他们完成了!
检查 here