Python：在 pandas 中使用滚动 + 应用需要 2 列作为参数的函数

Question

我有一个包含 2 列的数据框 (df)：

我想对 df[0] 的所有元素使用计算本身和 df[1] 列的函数:

def custom_fct_2(x,y):
    res=stats.percentileofscore(y.values,x.iloc[-1])
    return res

我收到以下错误：TypeError:

("'numpy.float64' object is not callable", u'occurred at index 0')

完整代码如下：

from __future__ import division
import pandas as pd
import sys
from scipy import stats

def custom_fct_2(x,y):
    res=stats.percentileofscore(y.values,x.iloc[-1])
    return res

df= pd.DataFrame([[1,2],[4,5],[3,6],[10,12],[1,2],[4,5],[3,6],[10,12]])
df['perc']=df.rolling(3).apply(custom_fct_2(df[0],df[1]))

有人可以帮我吗？（我是 Python 的新人）

Out[2]: 

        0   1
...
    5   4   5
    6   3   6
    7  10  12

I want the percentile ranking of [10] in [12,6,5]
I want the percentile ranking of [3] in [6,5,2]
I want the percentile ranking of [4] in [5,2,12]
...

Answer 1

这里的问题是 rolling().apply() 函数无法为您提供横跨所有列的 3 行段。相反，它首先为您提供第 0 列的序列，然后是第 1 列。

也许有更好的解决方案，但我会展示我的至少有效的解决方案。

df= pd.DataFrame([[1,2],[4,5],[3,6],[10,12],[1,2],[4,5],[3,6],[10,12]])

def custom_fct_2(s):
  score = df[0][s.index.values[1]]  # you may use .values[-1] if you want the last element
  a = s.values
  return stats.percentileofscore(a, score)

我使用的是您提供的相同数据。但是我修改了你的 custom_fct_2() 功能。这里我们得到 s ，它是第 1 列的一系列 3 个滚动值。幸运的是，我们在这个系列中有索引，所以我们可以通过系列的“中间”索引从第 0 列获得分数.顺便说一句，在 Python [-1] 中表示集合的最后一个元素，但是根据您的解释，我相信您实际上想要中间的那个。

然后，应用函数。

# remove the shift() function if you want the value align to the last value of the rolling scores
df['prec'] = df[1].rolling(3).apply(custom_fct_2).shift(periods=-1)

shift 功能是可选的。这取决于您的要求，您的 prec 是否需要与第 0 列（正在使用中间分数）或第 1 列的滚动分数对齐。我假设您需要它。

Python：在 pandas 中使用滚动 + 应用需要 2 列作为参数的函数

Python: using rolling + apply with a function that requires 2 columns as arguments in pandas

python

function

apply

multiple-columns

pandas