如何基于 Pandas 中缺失值的另一列创建新列?
How to Create New Column Based on Another Column With Missing Value in Pandas?
我的问题在下面的视频中有描述link:
https://www.youtube.com/watch?v=nk5tBosK0iU
我不知道为什么我无法获得条件 (df[condition]
),其中 condition
是一个变量,适用于 NaN 值。
您需要通过 pandas.isnull
, but faster solution is with double numpy.where
:
在自定义函数中检查标量 NaN
import pandas as pd
import numpy as np
df = pd.DataFrame({'CloseDelta':[np.nan,-0.5,0.5],
'B':[0,1,0]})
print (df)
B CloseDelta
0 0 NaN
1 1 -0.5
2 0 0.5
def f(x):
if (pd.isnull(x)):
return 0
elif (x<0):
return -1
else:
return 1
df['new'] = np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))
df['new1'] = df.CloseDelta.apply(f)
print (df)
B CloseDelta new new1
0 0 NaN 0 0
1 1 -0.5 -1 -1
2 0 0.5 1 1
时间:
#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [28]: %timeit np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))
100 loops, best of 3: 1.99 ms per loop
In [29]: %timeit df.CloseDelta.apply(f)
1 loop, best of 3: 245 ms per loop
我的问题在下面的视频中有描述link:
https://www.youtube.com/watch?v=nk5tBosK0iU
我不知道为什么我无法获得条件 (df[condition]
),其中 condition
是一个变量,适用于 NaN 值。
您需要通过 pandas.isnull
, but faster solution is with double numpy.where
:
NaN
import pandas as pd
import numpy as np
df = pd.DataFrame({'CloseDelta':[np.nan,-0.5,0.5],
'B':[0,1,0]})
print (df)
B CloseDelta
0 0 NaN
1 1 -0.5
2 0 0.5
def f(x):
if (pd.isnull(x)):
return 0
elif (x<0):
return -1
else:
return 1
df['new'] = np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))
df['new1'] = df.CloseDelta.apply(f)
print (df)
B CloseDelta new new1
0 0 NaN 0 0
1 1 -0.5 -1 -1
2 0 0.5 1 1
时间:
#[300000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [28]: %timeit np.where(df.CloseDelta.isnull(), 0, np.where(df.CloseDelta<0, -1, 1))
100 loops, best of 3: 1.99 ms per loop
In [29]: %timeit df.CloseDelta.apply(f)
1 loop, best of 3: 245 ms per loop