在 pandas 中分配（或绑定）函数结果返回原始数据

Question

一旦我完成函数调用 np.polyfit（下面的实际代码），我就很难提取回归系数。我能够显示每个系数，但不确定如何实际提取它们以供将来与原始数据一起使用。

df=pd.read_csv('2_skews.csv')

这里是数据的head()

      date      expiry   symbol   strike vol
0  6/10/2015  1/19/2016    IBM      50  42.0
1  6/10/2015  1/19/2016    IBM      55  41.5
2  6/10/2015  1/19/2016    IBM      60  40.0
3  6/10/2015  1/19/2016    IBM      65  38.0
4  6/10/2015  1/19/2016    IBM      70  36.0

有很多交易品种在很多天和很多到期日都有很多行权价

我已按日期、符号和到期日对数据进行分组，然后调用回归函数：

df_reg=df.groupby(['date','symbol','expiry']).apply(regress)

我的这个函数似乎运行良好（给出了适当的系数），但我似乎无法访问它们并将它们与原始数据联系起来。

def regress(df):
    y=df['vol']
    x=df['strike']
    z=P.polyfit(x,y,4)
return (z)

我这样调用 polyfit:

from numpy.polynomial import polynomial as P

最终结果：

df_reg


date       symbol  expiry   
5/19/2015  GS      1/19/2016    [-112.064833151, 6.76871521993, -0.11147562136...
                   3/21/2016    [-131.2914493, 7.16441276062, -0.1145534833, 0...
           IBM     1/19/2016    [211.458028147, -5.01236287512, 0.044819313514...
                   3/21/2016    [-34.1027973807, 3.16990194634, -0.05676206572...
6/10/2015  GS      1/19/2016    [50.3916788503, 0.795484227762, -0.02701849495...
                   3/21/2016    [31.6090441114, 0.851878910113, -0.01972772270...
           IBM     1/19/2016    [-13.6159660078, 3.23002791603, -0.06015739505...
                   3/21/2016    [-51.6709051223, 4.80288173687, -0.08600312989...
dtype: object

顶部结果的函数形式为：

y = -0.000002x4 + 0.000735x3 - 0.111476x2 + 6.768715x - 112.064833

我已经尝试采纳之前个人的建设性批评，并尽可能清楚地说明我的问题，请让我知道是否还需要解决这个问题:-)

约翰

Answer 1

将 regress 的输出更改为 Series 而不是 numpy 数组将在分组时为您提供数据框。该系列的索引将是列名：

In [37]:

df = pd.DataFrame(
[[  '6/10/2015',  '1/19/2016',    'IBM',      50,  42.0],
[ '6/10/2015',  '1/19/2016',    'IBM',      55,  41.5],
[  '6/10/2015',  '1/19/2016',    'IBM',      60,  40.0],
[  '6/10/2015',  '1/19/2016',    'IBM',      65,  38.0],
[  '6/10/2015',  '1/19/2016',    'IBM',      70,  36.0]],
columns=['date', 'expiry', 'symbol', 'strike', 'vol'])

def regress(df):
    y=df['vol']
    x=df['strike']
    z=np.polyfit(x,y,4)
    return pd.Series(z, name='order', index=range(5)[::-1])

group_cols = ['date', 'expiry', 'symbol']
coeffs = df.groupby(group_cols).apply(regress)
coeffs


Out[40]:
                         order         4      3          2         1    0
date           expiry   symbol                  
6/10/2015   1/19/2016   IBM -5.388312e-18   0.000667    -0.13   8.033333   -118

要获取包含日期、到期日和代码组合的系数的列，您可以在这些列上合并 df 和 coeffs：

In [25]: df.merge(coeffs.reset_index(), on=group_cols)
Out[25]:
date    expiry     symbol   strike    vol    4              3               2          1       0
0   6/10/2015   1/19/2016   IBM 50  42.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
1   6/10/2015   1/19/2016   IBM 55  41.5    -6.644454e-18   0.000667    -0.13   8.033333    -118
2   6/10/2015   1/19/2016   IBM 60  40.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
3   6/10/2015   1/19/2016   IBM 65  38.0    -6.644454e-18   0.000667    -0.13   8.033333    -118
4   6/10/2015   1/19/2016   IBM 70  36.0    -6.644454e-18   0.000667    -0.13   8.033333    -118

然后您可以执行类似

的操作

df = df.merge(coeffs.reset_index(), on=group_cols)
strike_powers = pd.DataFrame(dict((i, df.strike**i) for i in range(5))
df['modelled_vol'] = (strike_powers * df[range(5)]).sum(axis=1)

在 pandas 中分配（或绑定）函数结果返回原始数据

Assigning (or tieing in) function results back to original data in pandas

python

return

function

pandas