如何在将配置单元作业提交到 dataproc 集群时执行 gcp 存储桶中的配置单元查询列表(在我的例子中为 gs:/hive/hive.sql")
How to execute list of hive queries which is in gcp storage bucket (in my case gs:/hive/hive.sql") while submitting hive job to dataproc cluster
这里我在 hiveJob 下的 queryList 中编写查询。
将 Hive 作业提交到 dataproc 集群
def submit_hive_job(dataproc, project, region,
cluster_name):
job_details = {
'projectId': project,
'job': {
'placement': {
'clusterName': cluster_name
},
"hiveJob": {
"queryList": {
###
how can i execute .sql file here which is in bucket
####
"queries": [
"CREATE TABLE IF NOT EXISTS sai ( eid int, name String, salary String, destination String)",
"Insert into table sai values (26,'Shiv','1500','ac')"
]
}
}
}
}
result = dataproc.projects().regions().jobs().submit(
projectId=project,
region=region,
body=job_details).execute()
job_id = result['reference']['jobId']
print('Submitted job Id {}'.format(job_id))
return job_id
hive.sql 存储桶中的文件
create table employee ( employeeid: int, employeename: string, salary: float) rows format delimited fields terminated by ‘,’ ;
describe employee;
select * from employee;
我发现我们可以将 .sql 文件保存在存储桶中,然后像下面这样指定 queryFileUri
"hiveJob": {
"queryFileUri":"gs://queryfile/test.sql"
}
这里我在 hiveJob 下的 queryList 中编写查询。
将 Hive 作业提交到 dataproc 集群
def submit_hive_job(dataproc, project, region,
cluster_name):
job_details = {
'projectId': project,
'job': {
'placement': {
'clusterName': cluster_name
},
"hiveJob": {
"queryList": {
###
how can i execute .sql file here which is in bucket
####
"queries": [
"CREATE TABLE IF NOT EXISTS sai ( eid int, name String, salary String, destination String)",
"Insert into table sai values (26,'Shiv','1500','ac')"
]
}
}
}
}
result = dataproc.projects().regions().jobs().submit(
projectId=project,
region=region,
body=job_details).execute()
job_id = result['reference']['jobId']
print('Submitted job Id {}'.format(job_id))
return job_id
hive.sql 存储桶中的文件
create table employee ( employeeid: int, employeename: string, salary: float) rows format delimited fields terminated by ‘,’ ;
describe employee;
select * from employee;
我发现我们可以将 .sql 文件保存在存储桶中,然后像下面这样指定 queryFileUri
"hiveJob": {
"queryFileUri":"gs://queryfile/test.sql"
}