如何知道 dataproc 初始化操作何时完成

How to know when dataproc initialization actions are done

我需要 运行 一个 Dataproc 集群,同时安装了 BigQuery 和 Cloud Storage 连接器。

我使用了 this script 的变体(因为我无权访问一般使用的存储桶),一切正常,但是当我 运行 工作时,集群是up 和 运行ning,它总是导致 Task was not acquired 错误。

我可以通过简单地在每个节点上重新启动 dataproc 代理来解决这个问题,但我真的需要它正常工作才能在我的集群创建后立即 运行 一个作业。似乎这部分脚本无法正常工作:

# Restarts Dataproc Agent after successful initialization
# WARNING: this function relies on undocumented and not officially supported Dataproc Agent
# "sentinel" files to determine successful Agent initialization and not guaranteed
# to work in the future. Use at your own risk!
restart_dataproc_agent() {
  # Because Dataproc Agent should be restarted after initialization, we need to wait until
  # it will create a sentinel file that signals initialization competition (success or failure)
  while [[ ! -f /var/lib/google/dataproc/has_run_before ]]; do
    sleep 1
  done
  # If Dataproc Agent didn't create a sentinel file that signals initialization
  # failure then it means that initialization succeded and it should be restarted
  if [[ ! -f /var/lib/google/dataproc/has_failed_before ]]; then
    service google-dataproc-agent restart
  fi
}
export -f restart_dataproc_agent

# Schedule asynchronous Dataproc Agent restart so it will use updated connectors.
# It could not be restarted sycnhronously because Dataproc Agent should be restarted
# after its initialization, including init actions execution, has been completed.
bash -c restart_dataproc_agent & disown

我的问题是:

  1. 如何知道初始化动作已经完成?
  2. 我是否have/How 正确重启新创建的集群节点上的 Dataproc 代理?

编辑: 这是我用来创建集群的命令(使用 1.3 镜像版本):

gcloud dataproc --region europe-west1 \
  clusters create my-cluster \
  --bucket my-bucket \
  --subnet default \
  --zone europe-west1-b \
  --master-machine-type n1-standard-1 \
  --master-boot-disk-size 50 \
  --num-workers 2 \
  --worker-machine-type n1-standard-2 \
  --worker-boot-disk-size 100 \
  --image-version 1.3 \
  --scopes 'https://www.googleapis.com/auth/cloud-platform' \
  --project my-project \
  --initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh \
  --metadata 'gcs-connector-version=1.9.6' \
  --metadata 'bigquery-connector-version=0.13.6'

另外,请注意,连接器初始化脚本已经修复并且现在可以正常工作,所以我现在正在使用它,但我仍然必须手动重新启动 dataproc 代理才能 运行 工作.

  1. Dataproc 代理在初始化操作成功后在 /var/log/google-dataproc-agent.0.log 文件中记录 Custom initialization actions finished. 消息。

  2. 不,您不需要手动重启 Dataproc 代理。

此问题是由 Dataproc agent service restart in the connectors initialization action and should be resolved by this PR 引起的。

至于什么时候初始化动作完成,可以查看dataproc的status.state,如果是CREATING说明还在执行初始化动作,如果是RUNNING就意味着他们完成了! 检查 here