pandas 数据框中的浮点数和整数比较

Question

我知道比较浮点数时的技术限制，但请考虑以下示例：

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': [1.12060000],
                   'col2': [1.12065000]})
df
Out[155]: 
        col1       col2
0 1.12060000 1.12065000

如您所见，col2 和 col1 正好相差 0.00005。现在，我想测试一下。我知道这个 returns 是错误的结果，因为我使用的是小数

(df.col2 - df.col1) < 0.00005
Out[156]: 
0    True
dtype: bool

然而，更让我费解的是下面的结果

(100000*df.col2 - 100000*df.col1) < 5
Out[157]: 
0    True
dtype: bool

而

(1000000*df.col2 - 1000000*df.col1) < 50
Out[158]: 
0    False
dtype: bool

为什么与 5 的比较失败，只有最后一个有效？我认为使用整数可以解决比较浮点数时的问题？

谢谢！

Answer 1

浮点精度是这里的问题。这些数字在 base-10 中看起来很自然，但您的计算机将它们存储在 base-2 中，这会导致奇怪的事情，例如 0.1 + 0.2 = 0.30000000000000004。在您的示例中：

>>> 1.12060000*100000, 1.12065000*100000
(112060.0, 112064.99999999999)
>>> 1.12060000*1000000, 1.12065000*1000000
(1120600.0, 1120650.0)

这就是第一个差值小于 5 的原因（它是 4.99999999999）

yes, but the usual "cure" has always been to convert to integers (here by multiplying by 10000).

啊！但是您没有转换为整数！只是为了更大的花车！这里的“治疗”是调用 round(float) 或者在 DataFrame 的情况下，df.round():

>>> round(1.12060000*100000), round(1.12065000*100000)
(112060, 112065)

pandas 数据框中的浮点数和整数比较

float and integer comparison in a pandas dataframe

python

floating-point

precision

pandas