pandas/numpy int64 中的意外 32 位整数溢出 (python 3.6)

Question

让我从示例代码开始：

import numpy
from pandas import DataFrame

a = DataFrame({"nums": [2233, -23160, -43608]})

a.nums = numpy.int64(a.nums)

print(a.nums ** 2)
print((a.nums ** 2).sum())

在我的本地机器和其他开发人员的机器上，这按预期工作并打印出：

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
2443029553

然而，在我们的生产服务器上，我们得到：

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
-1851937743

这是 32 位整数溢出，尽管它是 int64。

生产服务器使用相同版本的 python、numpy、pandas 等。它是 64-bit Windows Server 2012 OS 并且所有内容都报告 64 位（例如python --version、sys.maxsize、plastform.architecture).

可能是什么原因造成的？

Answer 1

这是 bottleneck 库中的一个错误，Pandas 安装后会使用它。在某些情况下，bottleneck.nansum 在 64 位输入上调用时不正确地具有 32 位溢出行为。

我认为这是由于 bottleneck using PyInt_FromLong even when long is 32-bit. I'm not sure why that even compiles, actually. There's an issue report on the bottleneck issue tracker, not yet fixed, as well as an issue report on the Pandas issue tracker，他们试图弥补 Bottleneck 的问题（但我认为他们在 Bottleneck 正常工作时将其关闭，而不是在它不工作时关闭） .

pandas/numpy int64 中的意外 32 位整数溢出 (python 3.6)

Unexpected 32-bit integer overflow in pandas/numpy int64 (python 3.6)

python

numpy

integer-overflow

python-3.x

pandas