如何基于 Pandas 中缺失值的另一列创建新列？

Question

我的问题在下面的视频中有描述link：

https://www.youtube.com/watch?v=nk5tBosK0iU

我不知道为什么我无法获得条件 (df[condition])，其中 condition 是一个变量，适用于 NaN 值。

Answer 1

您需要通过 pandas.isnull, but faster solution is with double numpy.where:

在自定义函数中检查标量 NaN

import pandas as pd
import numpy as np

df = pd.DataFrame({'CloseDelta':[np.nan,-0.5,0.5],
                   'B':[0,1,0]})

print (df)
  B  CloseDelta
0  0         NaN
1  1        -0.5
2  0         0.5

def f(x):
    if (pd.isnull(x)):
        return 0
    elif (x<0):
        return -1 
    else: 
        return 1

df['new'] = np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))

df['new1'] = df.CloseDelta.apply(f)

print (df)
   B  CloseDelta  new  new1
0  0         NaN    0     0
1  1        -0.5   -1    -1
2  0         0.5    1     1

时间:

#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [28]: %timeit np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))
100 loops, best of 3: 1.99 ms per loop

In [29]: %timeit df.CloseDelta.apply(f)
1 loop, best of 3: 245 ms per loop

如何基于 Pandas 中缺失值的另一列创建新列？

How to Create New Column Based on Another Column With Missing Value in Pandas?

nan

pandas