如何访问 Dataproc 集群元数据?
how to access Dataproc cluster metadata?
创建集群后,我试图检索我的附加组件的 URL 地址(不使用 GCP 仪表板)。我正在使用 de Dataproc python API ,更具体地说是 get_cluster()
函数。
该函数返回了大量数据,但我无法找到 Jupyter 网关 URL 或其他元数据。
from google.cloud import dataproc_v1
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response)
有没有人解决这个问题?
如果你已经关注了this doc to setup Jupyter access by enabling Component Gateway, then you can access the Web Interfaces as described here. The trick is that this is included in the API response for the v1beta2
版本。
代码中所需的更改很少(除了 google-cloud-dataproc
库之外没有其他要求)。只需将 dataproc_v1
替换为 dataproc_v1beta2
并使用 response.config.endpoint_config
:
访问端点
from google.cloud import dataproc_v1beta2
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1beta2.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response.config.endpoint_config)
在我的例子中,我得到:
http_ports {
key: "HDFS NameNode"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/hdfs/dfshealth.html"
}
http_ports {
key: "Jupyter"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/"
}
http_ports {
key: "JupyterLab"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/lab/"
}
http_ports {
key: "MapReduce Job History"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jobhistory/"
}
http_ports {
key: "Spark History Server"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/sparkhistory/"
}
http_ports {
key: "Tez"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/tez-ui/"
}
http_ports {
key: "YARN Application Timeline"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/"
}
http_ports {
key: "YARN ResourceManager"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/yarn/"
}
enable_http_port_access: true
你需要v1beat2
启用组件:
'endpoint_config': {
'enable_http_port_access': True
},
那么上面的答案就可以了:
client.get_cluster(project_id, region, cluster_name)
创建集群后,我试图检索我的附加组件的 URL 地址(不使用 GCP 仪表板)。我正在使用 de Dataproc python API ,更具体地说是 get_cluster()
函数。
该函数返回了大量数据,但我无法找到 Jupyter 网关 URL 或其他元数据。
from google.cloud import dataproc_v1
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response)
有没有人解决这个问题?
如果你已经关注了this doc to setup Jupyter access by enabling Component Gateway, then you can access the Web Interfaces as described here. The trick is that this is included in the API response for the v1beta2
版本。
代码中所需的更改很少(除了 google-cloud-dataproc
库之外没有其他要求)。只需将 dataproc_v1
替换为 dataproc_v1beta2
并使用 response.config.endpoint_config
:
from google.cloud import dataproc_v1beta2
project_id, cluster_name = '', ''
region = 'europe-west4'
client = dataproc_v1beta2.ClusterControllerClient(
client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
}
)
response = client.get_cluster(project_id, region, cluster_name)
print(response.config.endpoint_config)
在我的例子中,我得到:
http_ports {
key: "HDFS NameNode"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/hdfs/dfshealth.html"
}
http_ports {
key: "Jupyter"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/"
}
http_ports {
key: "JupyterLab"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jupyter/lab/"
}
http_ports {
key: "MapReduce Job History"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/jobhistory/"
}
http_ports {
key: "Spark History Server"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/sparkhistory/"
}
http_ports {
key: "Tez"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/tez-ui/"
}
http_ports {
key: "YARN Application Timeline"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/apphistory/"
}
http_ports {
key: "YARN ResourceManager"
value: "https://REDACTED-dot-europe-west4.dataproc.googleusercontent.com/yarn/"
}
enable_http_port_access: true
你需要v1beat2
启用组件:
'endpoint_config': {
'enable_http_port_access': True
},
那么上面的答案就可以了:
client.get_cluster(project_id, region, cluster_name)