将数据框复制到临时 table 中,但并非所有值都已设置错误

Copying dataframe into a temp table where not all values are set error

这是这种情况的延续:

我正在尝试将数据框插入到临时 table 中,该临时 table 是在我使用 python 连接器创建的会话期间创建的,并且无法将值插入 table 中数据框尚未设置。如何添加一列空白 NaN 和 Null 值,以便稍后在 table 中设置?

conn.cursor().execute("create or replace temp table x as")

>>> conn.cursor().execute("USE DATABASE temp_db;")

<snowflake.connector.cursor.SnowflakeCursor object at 0x10c78b048>

>>> conn.cursor().execute("create or replace temp table x(id number, first_name varchar, last_name varchar, email varchar, null_feild boolean, blank_feild varchar, letter_grade varchar(3));")

<snowflake.connector.cursor.SnowflakeCursor object at 0x10acebc88>
>>> df.to_sql('x', con=conn, index=False)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py:2712: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  method=method,
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 490, in execute
    query = command % processed_params
TypeError: not all arguments converted during string formatting

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py", line 2712, in to_sql
    method=method,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 518, in to_sql
    method=method,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1749, in to_sql
    table.create()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 641, in create
    if self.exists():
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 628, in exists
    return self.pd_sql.has_table(self.name, self.schema)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1762, in has_table
    return len(self.execute(query, [name]).fetchall()) > 0
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1610, in execute
    raise_with_traceback(ex)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/compat/__init__.py", line 47, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 490, in execute
    query = command % processed_params
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
>>> 

The dataframe is below: 

>>> df['letter_grade'] = np.nan
>>> df.head()
   id first_name last_name  ... null field  blank_ield  letter_grade
0   1      Paule    Tohill  ...      False         NaN           NaN
1   2       Rebe   Slyford  ...       True         NaN           NaN
2   3   Angelita    Antoni  ...      False         NaN           NaN
3   4      Giffy      Dehm  ...      False         NaN           NaN
4   5        Rob    Beadle  ...      False         NaN           NaN

[5 rows x 7 columns]
>>> df.to_sql('x', con=conn, index=False)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 490, in execute
    query = command % processed_params
TypeError: not all arguments converted during string formatting

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/generic.py", line 2712, in to_sql
    method=method,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 518, in to_sql
    method=method,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1749, in to_sql
    table.create()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 641, in create
    if self.exists():
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 628, in exists
    return self.pd_sql.has_table(self.name, self.schema)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1762, in has_table
    return len(self.execute(query, [name]).fetchall()) > 0
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1610, in execute
    raise_with_traceback(ex)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/compat/__init__.py", line 47, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/sql.py", line 1595, in execute
    cur.execute(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 490, in execute
    query = command % processed_params
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
>>> 

显然我不希望这是一个临时数据库,在第一个测试分数之后我会改变 table,我只是不确定为什么连接不喜欢基于table 上面的定义。

此处抛出的错误发生在计算实际值(例如 NaN/None)之前。在执行插入之前,Pandas 运行检查以查看 table 是否存在或是否需要创建它,这是根据回溯明确失败的部分(包含对 exists 的调用, has_table 等).

要对 Snowflake DB 使用 Panda 的 to_sql 函数,请确保您传递给它的是一个实际的 Snowflake DB SQLAlchemy 引擎对象,而不是通用对象。

对于传递给 to_sql 的非 SQLAlchemy 引擎类型的连接对象,Pandas only supports SQLite3 dialects,可以在错误中观察到(sqlite_master table 不是对 Snowflake 数据库有效,仅对 SQLite3 数据库有效):

pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting

按照 this Snowflake documentation guide 为 Snowflake DB 安装 SQLAlchemy 引擎,然后重建创建 SQLAlchemy 引擎对象的代码部分。指南中的 Verifying Your Installation 部分有一个使用 snowflake:// URI 支持的代码示例:

engine = create_engine(
  'snowflake://{user}:{password}@{account}/'.format(
    user='<your_user_login_name>',
    password='<your_password>',
    account='<your_account_name>',
  )
)

注意 1:标准 Snowflake Python 连接器安装不附带 SQLAlchemy 支持,需要作为 [=40= 的附加组件安装] 来利用它。

注释 2:Pandas 的最新版本支持向数据库插入 NaN 和 NULL 值,包含在另一个问题中:Python Pandas write to sql with NaN values