pandas.Series.interpolate() 沿 "index" 显示意外结果

Question

在我的示例中称为“bla”的 pandas.Series() 包含 Pa 中的压力作为索引和 m/s[ 中的风速=32=] 作为值：

bla 100200.0 2.0 97600.0 NaN 91100.0 NaN 85000.0 3.0 82600.0 NaN ... 6670.0 NaN 5000.0 2.0 4490.0 NaN 3880.0 NaN 3000.0 9.0 Length: 29498, dtype: float64 bla.index Float64Index([100200.0, 97600.0, 91100.0, 85000.0, 82600.0, 81400.0, 79200.0, 73200.0, 70000.0, 68600.0, ... 11300.0, 10000.0, 9970.0, 9100.0, 7000.0, 6670.0, 5000.0, 4490.0, 3880.0, 3000.0], dtype='float64', length=29498)

由于风速值经常 NaN，我打算考虑不同的压力水平进行插值，以便有更多的风速值可以使用。

docs of interpolate() 声明有一种名为“索引”的方法，它根据索引值进行插值，但结果与初始值相比没有意义：

bla.interpolate(method="index", axis=0, limit=1, limit_direction="both") 100200.0 **2.00** 97600.0 10.40 91100.0 8.00 85000.0 **3.00** 82600.0 9.75 ... 6670.0 3.00 5000.0 **2.00** 4490.0 9.00 3880.0 5.00 3000.0 **9.00** Length: 29498, dtype: float64

我用粗体标记了原始值。我宁愿在使用“线性”时期待类似的东西：

bla.interpolate(method="linear", axis=0, limit=1, limit_direction="both") 100200.0 **2.000000** 97600.0 2.333333 91100.0 2.666667 85000.0 **3.000000** 82600.0 4.600000 ... 6670.0 4.500000 5000.0 **2.000000** 4490.0 4.333333 3880.0 6.666667 3000.0 **9.000000**

尽管如此，我还是想适当地使用“索引”作为插值方法，因为考虑到插值的压力水平，这应该是最准确的，以标记每个风速值之间的“距离”。

总的来说，我想了解使用“索引”和其中的压力水平的插值结果如何变得如此违反直觉，以及我如何才能使它们更合理。

Answer 1

感谢@ALollz 在我的问题下方的第一条评论中，我找到了问题所在：

只是我的数据框有 2 个索引级别，外部是唯一的测量时间戳，内部是标准范围索引。我应该分别查看与唯一时间戳关联的每个子集。在这些子集中，插值是有意义的，并且生成的结果恰到好处。

示例：

# Loop over all unique timestamps in the outermost index level
for timestamp in df.index.get_level_values(level=0).unique():
    # Extract the current subset
    df_subset = df.loc[timestamp, :]

    # Carry out interpolation on a column of interest
    df_subset["column of interest"] = df_subset[
        "column of interest"].interpolate(method="linear",
                                          axis=0,
                                          limit=1,
                                          limit_direction="both")

pandas.Series.interpolate() 沿 "index" 显示意外结果

pandas.Series.interpolate() along "index" shows unexpected results

python

interpolation

series

pandas