Pandas apply() 自定义函数使用多列作为 "input"
Pandas apply() custom function using more than one column as "input"
也许查看这个简单的示例会帮助您理解我尝试做的事情:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
def _custom_function(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum((X['A'] / X['B']) + (0.2 * X['B']))
return Y
df['C'] = df.rolling(2).apply(_custom_function, axis=0)
调用自定义函数时,X是Series类型,只是df的第一列。是否可以通过 apply 函数传递 df?
编辑:可以使用 rolling().apply():
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
def _custom_function(X):
# whatever... just for the purpose of the example
Y = sum(0.2 * X)
return Y
df['C'] = df['A'].rolling(2).apply(_custom_function)
第二次编辑:带有滚动的列表理解未按预期运行
for x in df.rolling(3):
print(x)
正如您在下面的示例中看到的,两种方法都不会给出相同的输出:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
df['C'] = 0.2
def _custom_function_df(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum(X['C'] * X['B'])
return Y
def _custom_function_series(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum(0.2 * X)
return Y
df['result'] = df['B'].rolling(3).apply(_custom_function_series)
df['result2'] = [x.pipe(_custom_function_df) for x in df.rolling(3, min_periods=3)]
滚动输出第一行的列表理解(没有预期的 NaN),但仅在 len(x) = 3 后才开始正确滚动,滚动 window.
提前致谢!
将 DataFrame 传递给函数:
df['C'] = _custom_function(df)
或使用DataFrame.pipe
:
df['C'] = df.pipe(_custom_function)
print (df)
A B C
0 10 20 4.500000
1 20 30 6.666667
2 30 10 5.000000
3 50 15 6.333333
4 70 20 7.500000
5 40 30 7.333333
编辑:Rolling.apply
由每一列分别处理,所以不能在这里使用。
可能的解决方案:
df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
A B C
0 10 20 4.500000
1 20 30 11.166667
2 30 10 11.666667
3 50 15 11.333333
4 70 20 13.833333
5 40 30 14.833333
编辑:如果看起来有错误,默认 rolling
像 min_periods=1
一样工作。
这是解决方案(hack):
df['result'] = df['B'].rolling(3).apply(_custom_function_series)
df['result2']=[x.pipe(_custom_function_df) if len(x)==3 else np.nan for x in df.rolling(3)]
print (df)
A B C result result2
0 10 20 0.2 NaN NaN
1 20 30 0.2 NaN NaN
2 30 10 0.2 12.0 12.0
3 50 15 0.2 11.0 11.0
4 70 20 0.2 9.0 9.0
5 40 30 0.2 13.0 13.0
也许查看这个简单的示例会帮助您理解我尝试做的事情:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
def _custom_function(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum((X['A'] / X['B']) + (0.2 * X['B']))
return Y
df['C'] = df.rolling(2).apply(_custom_function, axis=0)
调用自定义函数时,X是Series类型,只是df的第一列。是否可以通过 apply 函数传递 df?
编辑:可以使用 rolling().apply():
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
def _custom_function(X):
# whatever... just for the purpose of the example
Y = sum(0.2 * X)
return Y
df['C'] = df['A'].rolling(2).apply(_custom_function)
第二次编辑:带有滚动的列表理解未按预期运行
for x in df.rolling(3):
print(x)
正如您在下面的示例中看到的,两种方法都不会给出相同的输出:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,50,70,40], "B": [20,30,10,15,20,30]})
df['C'] = 0.2
def _custom_function_df(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum(X['C'] * X['B'])
return Y
def _custom_function_series(X):
# whatever... just for the purpose of the example
# but I need X to be the actual df and not a series
Y = sum(0.2 * X)
return Y
df['result'] = df['B'].rolling(3).apply(_custom_function_series)
df['result2'] = [x.pipe(_custom_function_df) for x in df.rolling(3, min_periods=3)]
滚动输出第一行的列表理解(没有预期的 NaN),但仅在 len(x) = 3 后才开始正确滚动,滚动 window.
提前致谢!
将 DataFrame 传递给函数:
df['C'] = _custom_function(df)
或使用DataFrame.pipe
:
df['C'] = df.pipe(_custom_function)
print (df)
A B C
0 10 20 4.500000
1 20 30 6.666667
2 30 10 5.000000
3 50 15 6.333333
4 70 20 7.500000
5 40 30 7.333333
编辑:Rolling.apply
由每一列分别处理,所以不能在这里使用。
可能的解决方案:
df['C'] = [x.pipe(_custom_function) for x in df.rolling(2)]
print (df)
A B C
0 10 20 4.500000
1 20 30 11.166667
2 30 10 11.666667
3 50 15 11.333333
4 70 20 13.833333
5 40 30 14.833333
编辑:如果看起来有错误,默认 rolling
像 min_periods=1
一样工作。
这是解决方案(hack):
df['result'] = df['B'].rolling(3).apply(_custom_function_series)
df['result2']=[x.pipe(_custom_function_df) if len(x)==3 else np.nan for x in df.rolling(3)]
print (df)
A B C result result2
0 10 20 0.2 NaN NaN
1 20 30 0.2 NaN NaN
2 30 10 0.2 12.0 12.0
3 50 15 0.2 11.0 11.0
4 70 20 0.2 9.0 9.0
5 40 30 0.2 13.0 13.0