bigquery 的意外关键字参数 'type'
unexpected keyword argument 'type' for bigquery
所以我试着按照这个例子:
http://ajkannan.github.io/gcloud-python/latest/bigquery-usage.html
但是当我尝试创建 table 时:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'tata'
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField(name= 'id', type= 'int', mode= 'nullable'),
SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
SchemaField(name= 'html', type= 'string', mode= 'nullable'),
SchemaField(name= 'url', type= 'string', mode= 'nullable'),
SchemaField(name= 'categorie', type= 'string', mode= 'nullable'),
SchemaField(name= 'date', type= 'string', mode= 'nullable'),
SchemaField(name='name', type= 'string', mode= 'nullable'),
]
table.create()
我有一个:
TypeError Traceback (most recent call last)
<ipython-input-10-30edba459053> in <module>()
23
24 table.schema = [
---> 25 SchemaField(name= 'id', type= 'int', mode= 'nullable'),
26 SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
27 SchemaField(name= 'html', type= 'string', mode= 'nullable'),
TypeError: __init__() got an unexpected keyword argument 'type'
而且我不明白为什么 SchemaField 需要一个类型来初始化...
如果有人有想法
感谢和问候
编辑:
即使@andre622 也不工作:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-f177aa490fbb> in <module>()
29 SchemaField('categorie', 'STRING', mode= 'nullable'),
30 SchemaField('date', 'STRING', mode= 'nullable'),
---> 31 SchemaField('name', 'STRING', mode= 'nullable'),
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
即使有尼克的建议:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'choual'
# Prepares the new dataset
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
table.create()
我收到这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-191573ca7711> in <module>()
29 SchemaField('categorie', 'STRING'),
30 SchemaField('date', 'STRING'),
---> 31 SchemaField('name', 'STRING')
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
您不需要为传递给 table 定义的前两个 key-value 对提供密钥。此外,您的数据类型定义应遵循 BigQuery 摄取它们的方式。您的架构应定义为
table.schema = [
SchemaField('id', 'INTEGER', mode= 'nullable'),
SchemaField('zip', 'STRING', mode= 'nullable'),
SchemaField('html', 'STRING', mode= 'nullable'),
SchemaField('url', 'STRING', mode= 'nullable'),
SchemaField('categorie', 'STRING', mode= 'nullable'),
SchemaField('date', 'STRING', mode= 'nullable'),
SchemaField('name', 'STRING', mode= 'nullable'),
]
取自 github 来源,SchemaField 不带 type
,它带 field_type
,这就是在 @andre622 的建议之前导致你的错误的原因:
(请注意,以下代码不是我写的。所有代码均属于 Google Inc. 在 Apache 2 许可证下)
"""Describe a single field within a table schema.
:type name: str
:param name: the name of the field.
:type field_type: str
:param field_type: the type of the field (one of 'STRING', 'INTEGER',
'FLOAT', 'BOOLEAN', 'TIMESTAMP' or 'RECORD').
:type mode: str
:param mode: the type of the field (one of 'NULLABLE', 'REQUIRED',
or 'REPEATED').
:type description: str
:param description: optional description for the field.
:type fields: list of :class:`SchemaField`, or None
:param fields: subfields (requires ``field_type`` of 'RECORD').
"""
def __init__(self, name, field_type, mode='NULLABLE', description=None,
fields=None):
self.name = name
self.field_type = field_type
self.mode = mode
self.description = description
self.fields = fields
当您使用默认模式时,您应该能够使用:
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
至于为什么它需要一个类型,它怎么知道你想在该字段中存储什么类型的数据,在 DBMS 中,这允许为每个字段正确分配 space 作为一行将需要最多特定数量的字节。这样就可以通过了解第一行的位置以及每行的大小来进行随机访问。
编辑:
你能试试吗:
table = dataset.table('aspire_page', [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
])
也可以尝试使用 bigquery.SchemaField
而不是 SchemaField
,在从 gcloud.bigquery
和 google.cloud.bigquery
导入 SchemaField
后,您可能会遇到名称冲突。
所以我试着按照这个例子: http://ajkannan.github.io/gcloud-python/latest/bigquery-usage.html
但是当我尝试创建 table 时:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'tata'
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField(name= 'id', type= 'int', mode= 'nullable'),
SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
SchemaField(name= 'html', type= 'string', mode= 'nullable'),
SchemaField(name= 'url', type= 'string', mode= 'nullable'),
SchemaField(name= 'categorie', type= 'string', mode= 'nullable'),
SchemaField(name= 'date', type= 'string', mode= 'nullable'),
SchemaField(name='name', type= 'string', mode= 'nullable'),
]
table.create()
我有一个:
TypeError Traceback (most recent call last)
<ipython-input-10-30edba459053> in <module>()
23
24 table.schema = [
---> 25 SchemaField(name= 'id', type= 'int', mode= 'nullable'),
26 SchemaField(name= 'zip', type= 'string', mode= 'nullable'),
27 SchemaField(name= 'html', type= 'string', mode= 'nullable'),
TypeError: __init__() got an unexpected keyword argument 'type'
而且我不明白为什么 SchemaField 需要一个类型来初始化...
如果有人有想法
感谢和问候
编辑:
即使@andre622 也不工作:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-f177aa490fbb> in <module>()
29 SchemaField('categorie', 'STRING', mode= 'nullable'),
30 SchemaField('date', 'STRING', mode= 'nullable'),
---> 31 SchemaField('name', 'STRING', mode= 'nullable'),
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
即使有尼克的建议:
import os
import subprocess
import sys
from gcloud.bigquery import SchemaField
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "toto.json"
os.environ['GCLOUD_PROJECT'] = 'titi'
from gcloud import pubsub
client = pubsub.Client('titi')
# Imports the Google Cloud client library
from google.cloud import bigquery
# Instantiates a client
bigquery_client = bigquery.Client()
# The name for the new dataset
dataset_name = 'choual'
# Prepares the new dataset
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(name='aspire_page')
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
table.create()
我收到这个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-191573ca7711> in <module>()
29 SchemaField('categorie', 'STRING'),
30 SchemaField('date', 'STRING'),
---> 31 SchemaField('name', 'STRING')
32 ]
33
/usr/local/lib/python3.5/dist-packages/google/cloud/bigquery/table.py in schema(self, value)
113 """
114 if not all(isinstance(field, SchemaField) for field in value):
--> 115 raise ValueError('Schema items must be fields')
116 self._schema = tuple(value)
117
ValueError: Schema items must be fields
您不需要为传递给 table 定义的前两个 key-value 对提供密钥。此外,您的数据类型定义应遵循 BigQuery 摄取它们的方式。您的架构应定义为
table.schema = [
SchemaField('id', 'INTEGER', mode= 'nullable'),
SchemaField('zip', 'STRING', mode= 'nullable'),
SchemaField('html', 'STRING', mode= 'nullable'),
SchemaField('url', 'STRING', mode= 'nullable'),
SchemaField('categorie', 'STRING', mode= 'nullable'),
SchemaField('date', 'STRING', mode= 'nullable'),
SchemaField('name', 'STRING', mode= 'nullable'),
]
取自 github 来源,SchemaField 不带 type
,它带 field_type
,这就是在 @andre622 的建议之前导致你的错误的原因:
(请注意,以下代码不是我写的。所有代码均属于 Google Inc. 在 Apache 2 许可证下)
"""Describe a single field within a table schema.
:type name: str
:param name: the name of the field.
:type field_type: str
:param field_type: the type of the field (one of 'STRING', 'INTEGER',
'FLOAT', 'BOOLEAN', 'TIMESTAMP' or 'RECORD').
:type mode: str
:param mode: the type of the field (one of 'NULLABLE', 'REQUIRED',
or 'REPEATED').
:type description: str
:param description: optional description for the field.
:type fields: list of :class:`SchemaField`, or None
:param fields: subfields (requires ``field_type`` of 'RECORD').
"""
def __init__(self, name, field_type, mode='NULLABLE', description=None,
fields=None):
self.name = name
self.field_type = field_type
self.mode = mode
self.description = description
self.fields = fields
当您使用默认模式时,您应该能够使用:
table.schema = [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
]
至于为什么它需要一个类型,它怎么知道你想在该字段中存储什么类型的数据,在 DBMS 中,这允许为每个字段正确分配 space 作为一行将需要最多特定数量的字节。这样就可以通过了解第一行的位置以及每行的大小来进行随机访问。
编辑:
你能试试吗:
table = dataset.table('aspire_page', [
SchemaField('id','INTEGER'),
SchemaField('zip', 'STRING'),
SchemaField('html', 'STRING'),
SchemaField('url', 'STRING'),
SchemaField('categorie', 'STRING'),
SchemaField('date', 'STRING'),
SchemaField('name', 'STRING')
])
也可以尝试使用 bigquery.SchemaField
而不是 SchemaField
,在从 gcloud.bigquery
和 google.cloud.bigquery
导入 SchemaField
后,您可能会遇到名称冲突。