python pandas 的 float64 类型转换问题

Question

我需要将 18 位 float64 pandas 列转换为整数或字符串以便避免指数表示法可读。但是到目前为止我还没有成功。

df=pd.DataFrame(data={'col1':[915235514180670190,915235514180670208]},dtype='float64')
print(df)
       col1
0  9.152355e+17
1  9.152355e+17

然后我尝试将其转换为 int64。但是最后 3 位数字出错了。

df.col1.astype('int64')
0    915235514180670208
1    915235514180670208
Name: col1, dtype: int64

但是你看..这个值是错误的。不知道为什么。我从文档中读到 int64 应该能够容纳一个 18 位数字。

 int64  Integer (-9223372036854775808 to 9223372036854775807)

知道我做错了什么吗？我怎样才能达到我的要求？

根据 Eric Postpischil 的评论提供更多信息。如果 float64 不能容纳 18 位数字，我可能会遇到麻烦。问题是我通过来自数据库的 pandas read_sql 函数调用获取了这些数据。它会自动类型转换为 float64。我没有在 pandas read_sql()

中看到提及数据类型的选项

任何人对我可以做些什么来克服这个问题有什么想法吗？

Answer 1

问题是 float64 是 53 位的尾数，可以表示 15 或 16 位十进制数字 (ref)。

这意味着 18 位 float64 pandas 列 是一种错觉。无需进入 Pandas 甚至不需要进入 numpy 类型：

>>> n = 915235514180670190
>>> d = float(n)
>>> print(n, d, int(d))
915235514180670190 9.152355141806702e+17 915235514180670208

Answer 2

我解决了这个问题。想分享它，因为它可能对其他人有帮助。

    #Preapring SQL to extract all rows.
    sql='SELECT * , CAST(col1 AS CHAR(18)) as DUMMY_COL FROM table1;'
    
    #Get data from postgres
    df=pd.read_sql(sql, self.conn)
    
    # converting dummy col to integer
    df['DUMMY_COL']=df['DUMMY_COL'].astype('int64')
    
    # removing the original col1 column with replacing the int64 converted one.
    df['col1'] = df['DUMMY_COL']
    df.drop('DUMMY_COL', axis=1, inplace=True)

Answer 3

Pandas 中的

read_sql 有一个可能有用的 coerce_float 参数。它默认打开，记录为：

Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.

将此设置为 False 有帮助，例如与以下 schema/data:

import psycopg2

con = psycopg2.connect()

with con, con.cursor() as cur:
    cur.execute("CREATE TABLE foo ( id SERIAL PRIMARY KEY, num DECIMAL(30,0) )")
    cur.execute("INSERT INTO foo (num) VALUES (123456789012345678901234567890)")

我可以运行:

print(pd.read_sql("SELECT * FROM foo", con))

print(pd.read_sql("SELECT * FROM foo", con, coerce_float=False))

这给了我以下输出：

   id           num
0   1  1.234568e+29

   id                             num
0   1  123456789012345678901234567890

保持我插入的值的精度。

您没有提供所用数据库的很多详细信息，但希望以上内容对某些人有所帮助！

python pandas 的 float64 类型转换问题

python float64 type conversion issue with pandas

python

floating-point

pandas