Psycopg2 无法将 numpy nans 写入 postgresql table:双精度类型的输入语法无效:“”
Psycopg2 can't write numpy nans to postgresql table: invalid input syntax for type double precision: ""
我有一个小的 pyhton 代码,它用一个(或多个)nans 构建一个数据框,然后使用 copy_from 函数将它写入带有 psycopg2 模块的 postgres 数据库。这是:
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxx"
user = "xxxxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxx"
def nan_to_null(f,
_NULL=psycopg2.extensions.AsIs('NULL'),
_NaN=np.NaN,
_Float=psycopg2.extensions.Float):
if f != f:
return _NULL
else:
return _Float(f)
psycopg2.extensions.register_adapter(float, nan_to_null)
psycopg2.extensions.register_adapter(np.float, nan_to_null)
psycopg2.extensions.register_adapter(np.float64, nan_to_null)
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT)"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False)
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
问题是 psycopg2 无法将 nan 转换为 posgres NULL,尽管我使用了这个技巧:
(nan_to_null 函数)。
我无法让它工作,它抛出以下异常:
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: ""
CONTEXT: COPY test, line 2, column value1: ""
我在 windows 10 上使用 python 3.8 和 anaconda 3、psycopg2 v2.8.5 和 postgres v12.3。
谢谢!
您似乎插入了空字符串而不是 NULL 值,您可以使用以下 SQL 代码轻松重现错误:
CREATE TABLE test(
x FLOAT
);
INSERT INTO test(x) VALUES ('');
-- ERROR: invalid input syntax for type double precision: "" Position: 29
另一方面,NaN 可以安全地插入 PostgreSQL:
INSERT INTO test(x) VALUES ('NaN');
注意 PostgreSQL 浮点数支持与 IEEE 754 标准略有不同,因为 PostresSQL 需要所有值都是可排序的以一致地构建索引。因此 NaN 大于或等于任何其他数字,包括 PostgreSQL.
中的自身
问题是 copy_from
的使用。来自 docs:
Currently no adaptation is provided between Python and PostgreSQL types on COPY: ...
所以你的适配器没有发挥作用。
更新 可能的解决方案:
Pandas Changing the format of NaN values when saving to CSV
查看@cs95 的回答。
感谢 Adrian Klaver 和 jlandercy 的回答,解决方案很简单...使用以下替换 nan_to_null 函数的行手动将 np.nan 替换为 'NaN':
'''
df.replace(np.nan, "NaN", inplace=True)
'''
它工作正常。谢谢大家!
我在这里放了与 Adrian Klaver 更新的解决方案相同的代码。
更改的行是:
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
我们在 to_csv 函数中添加了 na_rep='NaN'。无需用另一行代码替换 nans。替换为 'NULL' 无效。
import psycopg2, logging, numpy as np, pandas as pd
from psycopg2 import sql
import traceback
from io import StringIO
if __name__ == '__main__':
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxxxx"
user = "xxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxxx"
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
#Creating a new table test
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT);"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
#Writting content
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
#Reading the table content
cmd = "SELECT * FROM {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
test_data = pd.DataFrame(cur.fetchall())
print(test_data)
print(type(test_data.loc[1, 2]))
#Deleting the test table
cmd = "DROP TABLE {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
打印显示 nan 被很好地解释并存储在数据库中。
我有一个小的 pyhton 代码,它用一个(或多个)nans 构建一个数据框,然后使用 copy_from 函数将它写入带有 psycopg2 模块的 postgres 数据库。这是:
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxx"
user = "xxxxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxx"
def nan_to_null(f,
_NULL=psycopg2.extensions.AsIs('NULL'),
_NaN=np.NaN,
_Float=psycopg2.extensions.Float):
if f != f:
return _NULL
else:
return _Float(f)
psycopg2.extensions.register_adapter(float, nan_to_null)
psycopg2.extensions.register_adapter(np.float, nan_to_null)
psycopg2.extensions.register_adapter(np.float64, nan_to_null)
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT)"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False)
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
问题是 psycopg2 无法将 nan 转换为 posgres NULL,尽管我使用了这个技巧:
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: ""
CONTEXT: COPY test, line 2, column value1: ""
我在 windows 10 上使用 python 3.8 和 anaconda 3、psycopg2 v2.8.5 和 postgres v12.3。 谢谢!
您似乎插入了空字符串而不是 NULL 值,您可以使用以下 SQL 代码轻松重现错误:
CREATE TABLE test(
x FLOAT
);
INSERT INTO test(x) VALUES ('');
-- ERROR: invalid input syntax for type double precision: "" Position: 29
另一方面,NaN 可以安全地插入 PostgreSQL:
INSERT INTO test(x) VALUES ('NaN');
注意 PostgreSQL 浮点数支持与 IEEE 754 标准略有不同,因为 PostresSQL 需要所有值都是可排序的以一致地构建索引。因此 NaN 大于或等于任何其他数字,包括 PostgreSQL.
中的自身问题是 copy_from
的使用。来自 docs:
Currently no adaptation is provided between Python and PostgreSQL types on COPY: ...
所以你的适配器没有发挥作用。
更新 可能的解决方案:
Pandas Changing the format of NaN values when saving to CSV
查看@cs95 的回答。
感谢 Adrian Klaver 和 jlandercy 的回答,解决方案很简单...使用以下替换 nan_to_null 函数的行手动将 np.nan 替换为 'NaN': ''' df.replace(np.nan, "NaN", inplace=True) ''' 它工作正常。谢谢大家!
我在这里放了与 Adrian Klaver 更新的解决方案相同的代码。 更改的行是:
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
我们在 to_csv 函数中添加了 na_rep='NaN'。无需用另一行代码替换 nans。替换为 'NULL' 无效。
import psycopg2, logging, numpy as np, pandas as pd
from psycopg2 import sql
import traceback
from io import StringIO
if __name__ == '__main__':
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxxxx"
user = "xxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxxx"
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
#Creating a new table test
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT);"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
#Writting content
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
#Reading the table content
cmd = "SELECT * FROM {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
test_data = pd.DataFrame(cur.fetchall())
print(test_data)
print(type(test_data.loc[1, 2]))
#Deleting the test table
cmd = "DROP TABLE {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
打印显示 nan 被很好地解释并存储在数据库中。