使用 Python 中的 google.cloud.bigquery 写入 bigQuery 时,必需参数丢失错误
Required parameter is missing error while writing to bigQuery with google.cloud.bigquery in Python
我正在使用 Python 2.7 中的以下代码片段将新行分隔 JSON 加载到 bigQuery:
from google.cloud import bigquery
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('testGAData')
table_ref = dataset.table('gaData')
table = bigquery.Table(table_ref)
with open('gaData.json', 'rb') as source_file:
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job = bigquery_client.load_table_from_file(
source_file, table, job_config=job_config)
它 returns 我出现以下错误:
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 897, in load_table_from_file
raise exceptions.from_http_response(exc.response)
google.api_core.exceptions.BadRequest: 400 POST https://www.googleapis.com/upload/bigquery/v2/projects/test-project-for-experiments/jobs?uploadType=resumable: Required parameter is missing
为什么会出现此错误?我怎样才能解决这个问题?还有其他人遇到过类似的问题吗?提前致谢。
编辑:添加最后一段,包括 python 导入并更正缩进。
初始代码中观察到的问题
您缺少 table 的架构。您可以使用 job_config.autodetect = True
或 job_config.schema = [bigquery.SchemaField("FIELD NAME", "FIELD TYPE")]
.
根据文档,您应该为 JSON 文件源设置 job_config.source_format = `bigquery.SourceFormat.NEWLINE_DELIMITED_JSON`
您应该将 table_ref
变量作为参数传递,而不是 bigquery_client.load_table_from_file(source_file, table, job_config=job_config)
中的 table
变量
Link 到文档
工作代码
下面的代码适合我。我正在使用 python 3 和 google-cloud-bigquery v1.5
from google.cloud import bigquery
client = bigquery.Client()
dataset_id, table_id = "TEST_DATASET", "TEST_TABLE"
data_ref = client.dataset(dataset_id)
table_ref = data_ref.table(table_id)
file_path = "path/to/test.json"
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
#job_config.autodetect = True
job_config.schema = [bigquery.SchemaField("Name", "STRING"), bigquery.SchemaField("Age", "INTEGER")]
with open(file_path, 'rb') as source_file:
job = client.load_table_from_file(source_file, table_ref, location='US', job_config=job_config)
job.result()
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, table_id))
输出
>> Loaded 2 rows into TEST_DATASET:TEST_TABLE.
我正在使用 Python 2.7 中的以下代码片段将新行分隔 JSON 加载到 bigQuery:
from google.cloud import bigquery
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('testGAData')
table_ref = dataset.table('gaData')
table = bigquery.Table(table_ref)
with open('gaData.json', 'rb') as source_file:
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job = bigquery_client.load_table_from_file(
source_file, table, job_config=job_config)
它 returns 我出现以下错误:
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 897, in load_table_from_file
raise exceptions.from_http_response(exc.response)
google.api_core.exceptions.BadRequest: 400 POST https://www.googleapis.com/upload/bigquery/v2/projects/test-project-for-experiments/jobs?uploadType=resumable: Required parameter is missing
为什么会出现此错误?我怎样才能解决这个问题?还有其他人遇到过类似的问题吗?提前致谢。 编辑:添加最后一段,包括 python 导入并更正缩进。
初始代码中观察到的问题
您缺少 table 的架构。您可以使用
job_config.autodetect = True
或job_config.schema = [bigquery.SchemaField("FIELD NAME", "FIELD TYPE")]
.根据文档,您应该为 JSON 文件源设置
job_config.source_format = `bigquery.SourceFormat.NEWLINE_DELIMITED_JSON`
您应该将
table_ref
变量作为参数传递,而不是bigquery_client.load_table_from_file(source_file, table, job_config=job_config)
中的
table
变量
Link 到文档
工作代码
下面的代码适合我。我正在使用 python 3 和 google-cloud-bigquery v1.5
from google.cloud import bigquery
client = bigquery.Client()
dataset_id, table_id = "TEST_DATASET", "TEST_TABLE"
data_ref = client.dataset(dataset_id)
table_ref = data_ref.table(table_id)
file_path = "path/to/test.json"
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
#job_config.autodetect = True
job_config.schema = [bigquery.SchemaField("Name", "STRING"), bigquery.SchemaField("Age", "INTEGER")]
with open(file_path, 'rb') as source_file:
job = client.load_table_from_file(source_file, table_ref, location='US', job_config=job_config)
job.result()
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, table_id))
输出
>> Loaded 2 rows into TEST_DATASET:TEST_TABLE.