Dataframe 插值在 Panda + 多维插值中不起作用

Dateframe interpolate not working in Panda + multidimensional interpolation

数据帧的多维插值不起作用

import pandas as pd
import numpy as np
raw_data = {'CCY_CODE': ['SGD','USD','USD','USD','USD','USD','USD','EUR','EUR','EUR','EUR','EUR','EUR','USD'],
            'END_DATE': ['16/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018',
                        '17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018','17/03/2018'],
            'STRIKE':[0.005,0.01,0.015,0.02,0.025,0.03,0.035,0.04,0.045,0.05,0.55,0.06,0.065,0.07],
            'VOLATILITY':[np.nan,np.nan,0.3424,np.nan,0.2617,0.2414,np.nan,np.nan,0.215,0.212,0.2103,np.nan,0.2092,np.nan]
           }
df_volsurface = pd.DataFrame(raw_data,columns = ['CCY_CODE','END_DATE','STRIKE','VOLATILITY'])
df_volsurface['END_DATE'] = pd.to_datetime(df_volsurface['END_DATE'])
df_volsurface.interpolate(method='akima',limit_direction='both')

输出:

<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>NaN</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>NaN</td></tr></tbody></table>

预期结果:

<table><tbody><tr><th> </th><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>NaN</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>Expected some logical value</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.296358</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.230295</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.220911</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.209471</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>Expected some logical value</td></tr></tbody></table>

线性插值方法在不考虑ccy_code

的情况下将最后可用值复制到所有后向和前向缺失值
df_volsurface.interpolate(method='linear',limit_direction='both')

输出:

<table><tbody><tr><th>CCY_CODE</th><th>END_DATE</th><th>STRIKE</th><th>VOLATILITY</th><th> </th></tr><tr><td>0</td><td>SGD</td><td>3/16/2018</td><td>0.005</td><td>0.3424</td></tr><tr><td>1</td><td>USD</td><td>3/17/2018</td><td>0.01</td><td>0.3424</td></tr><tr><td>2</td><td>USD</td><td>3/17/2018</td><td>0.015</td><td>0.3424</td></tr><tr><td>3</td><td>USD</td><td>3/17/2018</td><td>0.02</td><td>0.30205</td></tr><tr><td>4</td><td>USD</td><td>3/17/2018</td><td>0.025</td><td>0.2617</td></tr><tr><td>5</td><td>USD</td><td>3/17/2018</td><td>0.03</td><td>0.2414</td></tr><tr><td>6</td><td>USD</td><td>3/17/2018</td><td>0.035</td><td>0.2326</td></tr><tr><td>7</td><td>EUR</td><td>3/17/2018</td><td>0.04</td><td>0.2238</td></tr><tr><td>8</td><td>EUR</td><td>3/17/2018</td><td>0.045</td><td>0.215</td></tr><tr><td>9</td><td>EUR</td><td>3/17/2018</td><td>0.05</td><td>0.212</td></tr><tr><td>10</td><td>EUR</td><td>3/17/2018</td><td>0.55</td><td>0.2103</td></tr><tr><td>11</td><td>EUR</td><td>3/17/2018</td><td>0.06</td><td>0.20975</td></tr><tr><td>12</td><td>EUR</td><td>3/17/2018</td><td>0.065</td><td>0.2092</td></tr><tr><td>13</td><td>USD</td><td>3/17/2018</td><td>0.07</td><td>0.2092</td></tr></tbody></table>

感谢任何帮助!谢谢!

我想指出,这仍然是一维插值。我们有一个自变量 ('STRIKE') 和一个因变量 ('VOLATILITY')。插值是针对不同的条件进行的,例如对于每一天,每种货币,每种情况等。以下是如何根据 'END_DATE' 和 'CCY_CODE' 进行插值的示例。

# set all the conditions as index
df_volsurface.set_index(['END_DATE', 'CCY_CODE', 'STRIKE'], inplace=True)
df_volsurface.sort_index(level=['END_DATE', 'CCY_CODE', 'STRIKE'], inplace=True)
# create separate columns for all criteria except the independent variable
df_volsurface = df_volsurface.unstack(level=['END_DATE', 'CCY_CODE'])
for ccy in df_volsurface:
    indices = df_volsurface[ccy].notna()
    if not any(indices):
        continue  # we are not interested in a column with only NaN
    x = df_volsurface.index.get_level_values(level='STRIKE')  # independent var
    y = df_volsurface[ccy]  # dependent var
    # create interpolation function
    f_interp = scipy.interpolate.interp1d(x[indices], y[indices], kind='linear', 
                            bounds_error=False, fill_value='extrapolate')
    df_volsurface['VOL_INTERP', ccy[1], ccy[2]] = f_interp(x)
print(df_volsurface)

其他条件的插值应该类似地工作。这是生成的 DataFrame:

         VOLATILITY                    VOL_INTERP         
END_DATE 2018-03-16 2018-03-17         2018-03-17         
CCY_CODE        SGD        EUR     USD        EUR      USD
STRIKE                                                    
0.005           NaN        NaN     NaN    0.23900  0.42310
0.010           NaN        NaN     NaN    0.23600  0.38275
0.015           NaN        NaN  0.3424    0.23300  0.34240
0.020           NaN        NaN     NaN    0.23000  0.30205
0.025           NaN        NaN  0.2617    0.22700  0.26170
0.030           NaN        NaN  0.2414    0.22400  0.24140
0.035           NaN        NaN     NaN    0.22100  0.22110
0.040           NaN        NaN     NaN    0.21800  0.20080
0.045           NaN     0.2150     NaN    0.21500  0.18050
0.050           NaN     0.2120     NaN    0.21200  0.16020
0.055           NaN     0.2103     NaN    0.21030  0.13990
0.060           NaN        NaN     NaN    0.20975  0.11960
0.065           NaN     0.2092     NaN    0.20920  0.09930
0.070           NaN        NaN     NaN    0.20865  0.07900

使用 df_volsurface.stack() 到 return 到您选择的多索引。还有几种 pandas 插值方法可供选择。但是,我还没有使用 method='akima' 为您的问题找到令人满意的解决方案,因为它只在给定的数据点之间进行插值,但似乎没有外推。