解释 Pandas.Series.interpolate 的奇怪行为

Question

s = pd.Series([0, 2, np.nan, 8])
print(s)

interp = s.interpolate(method='polynomial', order=2)
print(interp)

这会打印：

0    0.0
1    2.0
2    NaN
3    8.0
dtype: float64
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

现在，如果我再添加一个 np.nan 到 series，

s = pd.Series([0, 2, np.nan, np.nan, 8])
print(s)

interp = s.interpolate(method='polynomial', order=2)
print(interp)

我得到了更准确的结果：

0    0.0
1    2.0
2    NaN
3    NaN
4    8.0
dtype: float64
0    0.0
1    2.0
2    4.0
3    6.0
4    8.0
dtype: float64

Series.interpolate recursive 是因为它使用插值来进一步插值，然后会影响以前的插值吗？

Answer 1

你实际上是在插入两个不同的函数！

在第一种情况下，您要寻找一个满足以下几点的函数：
(0,0), (1,2), (3,8)
但在第二种情况下，您会寻找一个通过以下几点的函数：
(0,0), (1,2), (4,8)

pd.Series的索引表示x轴上的点，pd.Series的数据表示y轴上的点。

因此，在您的第一个示例中尝试进行以下更改：
~~s = pd.Series([0, 2, np.nan, 8])~~

s = pd.Series([0, 2, np.nan, 8], [0,1,2,4])
s.interpolate(method='polynomial', order=2)

你应该得到输出：

0    0.0
1    2.0
2    4.0
4    8.0
dtype: float64

作为替代方案，您还可以： s = pd.Series([0, 2, np.nan, 8], [0,1,3,4])
和输出：

0    0.0
1    2.0
3    6.0
4    8.0
dtype: float64

希望对您有所帮助。

Explain curious behavior of Pandas.Series.interpolate