如何使用 bq 加载命令加载多个 AVRO 文件
How to load multiple AVRO files using a bq load command
我正在尝试按照此文档将多个 AVRO 文件加载到大查询中:
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro
根据文档,执行此操作的命令是:
bq --location=US load --source_format=AVRO [DATASET].[TABLE_NAME] "gs://mybucket/00/*.avro","gs://mybucket/01/*.avro"
我创建了一个搜索文件的脚本并像这样挂载命令:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro", "gs://mybucket/foo_bar/01.avro", "gs://mybucket/bar/211.avro"
但这只有在我有一个这样的文件时才有效:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro"
当我尝试对多个文件使用该命令时,错误是:
Too many positional args, still have ["gs://mybucket/foo_bar/01.avro"]
这是我创建命令的脚本:
def create_command_bq_load(buckets):
for x, bucket in enumerate(buckets):
command = 'bq load --source_format=AVRO --noreplace %s.%s_%s$%s' % (datasetname, bucket['product'], bucket['event'], bucket['data_partition'])
if bucket['files']:
command_file = ''
for x in range(len(bucket['files'])):
command_file = '%s "%s",' % (command_file, bucket['files'][x])
command_file = command_file
commands.append((command + ' ' + command_file)[:-1])
return commands
有帮助吗?
已解决,我的错误是两个文件之间的 space ' '
字符。正确的做法是:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro","gs://mybucket/foo_bar/01.avro","gs://mybucket/bar/211.avro"
我正在尝试按照此文档将多个 AVRO 文件加载到大查询中:
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro
根据文档,执行此操作的命令是:
bq --location=US load --source_format=AVRO [DATASET].[TABLE_NAME] "gs://mybucket/00/*.avro","gs://mybucket/01/*.avro"
我创建了一个搜索文件的脚本并像这样挂载命令:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro", "gs://mybucket/foo_bar/01.avro", "gs://mybucket/bar/211.avro"
但这只有在我有一个这样的文件时才有效:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro"
当我尝试对多个文件使用该命令时,错误是:
Too many positional args, still have ["gs://mybucket/foo_bar/01.avro"]
这是我创建命令的脚本:
def create_command_bq_load(buckets):
for x, bucket in enumerate(buckets):
command = 'bq load --source_format=AVRO --noreplace %s.%s_%s$%s' % (datasetname, bucket['product'], bucket['event'], bucket['data_partition'])
if bucket['files']:
command_file = ''
for x in range(len(bucket['files'])):
command_file = '%s "%s",' % (command_file, bucket['files'][x])
command_file = command_file
commands.append((command + ' ' + command_file)[:-1])
return commands
有帮助吗?
已解决,我的错误是两个文件之间的 space ' '
字符。正确的做法是:
bq load --source_format=AVRO --noreplace foo.bar3456 "gs://mybucket/foo/36.avro","gs://mybucket/foo_bar/01.avro","gs://mybucket/bar/211.avro"