lookback/shift 在 pandas 数据帧中调用函数

Question

如果我有以下数据框：

  date       A     B    M     S
 20150101    8     7    7.5   0
 20150101    10    9    9.5   -1
 20150102    9     8    8.5   1
 20150103    11    11   11    0
 20150104    11    10   10.5  0
 20150105    12    10   11    -1
 ...

如果我想通过以下规则创建另一个列'cost'：

如果 S < 0，成本 = (M-B).shift(1)*S
如果 S > 0，成本 = (M-A).shift(1)*S
如果 S == 0，成本=0

目前，我正在使用以下功能：

def cost(df):
if df[3]<0:
    return np.roll((df[2]-df[1]),1)*df[3]
elif df[3]>0:
    return np.roll((df[2]-df[0]),1)*df[3]
else:
    return 0
df['cost']=df.apply(cost,axis=0)

还有其他方法吗？我可以以某种方式在用户定义的函数中使用 pandas shift 函数吗？谢谢。

Answer 1

这样做通常很昂贵，因为当您 apply 用户定义函数时，您将失去矢量速度优势。相反，如何使用 the numpy version of the ternary operator:

import numpy as np

np.where(df[3] < 0,
    np.roll((df[2]-df[1]),1),
    np.where(df[3] > 0,
        np.roll((df[2]-df[0]),1)*df[3] 
        0))

(当然赋值给df['cost']).

Answer 2

np.where(condition, A, B) 是 NumPy elementwise 相当于

A if condition else B

np.select(conditions, choices) 是 np.where 的概括，当有两个以上的选择时很有用。

所以，就像 Ami Tavory 的回答一样，除了使用 np.select，你可以使用

import numpy as np
import pandas as pd
df = pd.read_table('data', sep='\s+')
conditions = [S < 0, S > 0]
M, A, B, S = [df[col] for col in 'MABS']
choices = [(M-B).shift(1)*S, (M-A).shift(1)*S]
df['cost'] = np.select(conditions, choices, default=0)

产生

       date   A   B     M  S  cost
0  20150101   8   7   7.5  0   0.0
1  20150101  10   9   9.5 -1  -0.5
2  20150102   9   8   8.5  1  -0.5
3  20150103  11  11  11.0  0   0.0
4  20150104  11  10  10.5  0   0.0
5  20150105  12  10  11.0 -1  -0.5

lookback/shift 在 pandas 数据帧中调用函数

lookback/shift in pandas dataframes calling functions

python

numpy

dataframe

pandas