为什么 Map 有效但 Apply 引发 ValueError

Question

我正在努力适应各种使用 Pandas 的方式，并且我正在努力理解为什么 Map、Apply 和 Vectorization 与 return 非-布尔值，但当应用的函数 return 是布尔值时，应用和矢量化有时会失败。本题将关注Apply。

具体来说，我写了非常简单的小代码来说明挑战：

import numpy as np
import pandas as pd

# make dataframe
x = range(1000)
df = pd.DataFrame(data = x, columns = ['Number']) 

# simple function to test if a number is a prime number
def is_prime(num):
    if num < 2:
        return False
    elif num == 2: 
        return True
    else: 
        for i in range(2,num):
            if num % i == 0:
                return False
    return True

# test if every number in the dataframe is prime using Map
df['map prime'] = list(map(is_prime, df['Number']))
df.head()

下面给出了我期望的输出：

所以这里是我不再理解发生了什么的地方：当我尝试使用 apply 时，我得到一个 ValueError。

in: df['apply prime'] = df.apply(func = is_prime, args = df['Number'], axis=1)
out: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我错过了什么？

谢谢！

p.s。我知道有更有效的方法来测试素数。我故意写了一个低效的函数，这样我就可以测试应用和矢量化实际上比映射快多少，但后来我运行进入了这个挑战。谢谢。

Answer 1

So here's where I no longer understand what's going on: when I try to use apply, I get a ValueError.

df.apply(..., axis=1),传pd.Series(...).

即 df['apply prime'] = df['Number'].apply(func = is_prime) 应该可以。

Given that apply is ostensibly faster than map, and vectorization faster still.

此外pd.DataFrame.apply(...)，没有使用任何类型的向量化，只是一个简单的C for循环（ex.cython），所以相信map(...) 应该渐近地更快。

更新

您可能需要弄清楚，.apply(...) 方法将给定 axis=x 的值传递给函数，并且 returns Y 可以是任何数据类型，如果是 pd.DataFrame（多个键）。

假设 df.shape = (1000, 4)，如果我们打算沿着 axis=1、i.e. df.shape[1] 移动，这意味着您的应用函数将被调用 1000 次，每个运行它有 (4, ) 类型 pd.Series 的元素，您可以在函数本身内部使用这些键，或者只是将键作为参数传递，pd.DataFrame.apply(..., args=[...]).

import numpy as np
import pandas as pd

x = np.random.randn(1000, 4)
df = pd.DataFrame(data=x, columns=['a', 'b', 'c', 'd'])

print(df.shape)

df.head()

def func(x, key1, key2):

  # print(x.shape)

  if x[key1] > x[key2]:
    
    return True

  return False

df.apply(func, axis=1, args=['a', 'b'])

为什么 Map 有效但 Apply 引发 ValueError

Why does Map work but Apply raises ValueError

python

vectorization

apply

pandas

更新