为什么 Vectorization 在数字较大时会失败，但 Map 和 Apply 有效？

Question

我正在尝试进一步了解Map、Apply和Vectorization之间的区别，刚遇到一个我不明白的挑战：对于小数字，这三个函数实现相同的结果，但是对于大数字Vectorization似乎失败了。这就是我的意思：

# get a simple dataframe set up
import numpy as np
import pandas as pd
x = range(10)
y = range(10,20)
df = pd.DataFrame(data = zip(x,y), columns = ['x','y']) 

# define a simple function to test map, apply, and vectorization with
def simple_power(num1, num2):
    return num1 ** num2

# use Map, Apply, and Vectorization to apply the function to every row in the dataframe
df['map power'] = list(map(simple_power, *(df['x'], df['y'])))
df['apply power'] = df.apply(lambda row: simple_power(row['x'], row['y']), axis=1)
df['optimize power'] = simple_power(df['x'], df['y'])

一切正常：

in: df.head()
out:    x   y   map power   apply power     vectorized power
0       0   10  0           0               0
1       1   11  1           1               1
2       2   12  4096        4096            4096
3       3   13  1594323     1594323         1594323
4       4   14  268435456   268435456       268435456

这里是事情变得混乱的地方：如果我用更大的范围替换我的 x 和 y，映射和应用仍然有效，但矢量化失败：

# set up dataframe with larger numbers to multiply together
x = range(100)
y = range(100,200)
df = pd.DataFrame(data = zip(x,y), columns = ['x','y'])

然后，如果我重新运行映射、应用和矢量化，我会得到一个不稳定的矢量化输出：

in: df.head()
out:

Map 和 Apply 彼此一致，但 Vectorization 给出了一个无意义的结果。

谁能告诉我这是怎么回事？谢谢！

Answer 1

https://github.com/numpy/numpy/issues/8987 and https://github.com/numpy/numpy/issues/10964 是你的问题所在。

在您的函数中使用 ** 时，您隐式使用 numpy.power 当您溢出整数时，您看不到错误。

这是一个已知错误，应该得到修复。

为什么 Vectorization 在数字较大时会失败，但 Map 和 Apply 有效？

Why does Vectorization fail with larger numbers but Map and Apply work?

python

vectorization

apply

pandas