Pandas 将 NaN 从零插入到下一个有效值

Question

我正在寻找一种将缺失值 (NaN) 从零线性插值到下一个有效值的方法。

例如：

     A    B   C   D  E
0  NaN  2.0 NaN NaN  0
1  3.0  4.0 NaN NaN  1
2  NaN  NaN NaN NaN  5
3  NaN  3.0 NaN NaN  4

鉴于此 table，我希望输出如下所示：

     A    B   C   D  E
0  NaN  2.0   0   0  0
1  3.0  4.0   0 0.5  1
2  NaN  NaN NaN NaN  5
3  NaN  3.0   0   2  4

我尝试使用 fillna 仅将下一个 NaN 填充为有效值 0，然后对整个数据帧进行线性插值。我在这里面临的问题是，用 fillna 指定一个值和一个限制不会影响连续的 NaN，但会限制要填充的列的总数。

如果可能，请仅建议解决方案，而不要手动遍历每一行，因为我正在处理大型数据帧。

提前致谢。

Answer 1

这是一种方法，可以将有效数字后的第一个 NaN 替换为 0，然后按行进行插值。我在最后添加了额外的行来说明同一行上的多个填充、仅一个值的填充或以 NaN 条纹结尾的行的行为。

示例数据

     A    B   C   D  E
0  NaN  2.0 NaN NaN  0
1  3.0  4.0 NaN NaN  1
2  NaN  NaN NaN NaN  5
3  NaN  3.0 NaN NaN  4
4  3   NaN  7  NaN   5
5  NaN  4   7  NaN   6
6  NaN  4   7  NaN  NaN
7  5   NaN  5  NaN  NaN

代码

m = (df.notnull().cummax(axis=1) & df.isnull()).astype(int).diff(axis=1).fillna(0)
update = m.where(m.eq(1) & m.loc[:, ::-1].cummin(axis=1).eq(-1)).replace(1, 0)

df.update(update)  # Add in 0s

df = df.interpolate(axis=1, limit_area='inside')

     A    B    C    D    E
0  NaN  2.0  0.0  0.0  0.0
1  3.0  4.0  0.0  0.5  1.0
2  NaN  NaN  NaN  NaN  5.0
3  NaN  3.0  0.0  2.0  4.0
4  3.0  0.0  7.0  0.0  5.0
5  NaN  4.0  7.0  0.0  6.0
6  NaN  4.0  7.0  NaN  NaN
7  5.0  0.0  5.0  NaN  NaN

工作原理：

(df.notnull().cummax(1) & df.isnull())  # True for streaks of null after non-null
#       A      B      C      D      E
#0  False  False   True   True  False
#1  False  False   True   True  False
#2  False  False  False  False  False
#3  False  False   True   True  False
#4  False   True  False   True  False
#5  False  False  False   True  False
#6  False  False  False   True   True
#7  False   True  False   True   True

# Taking the diff then allows you to find only the first NaN after any non-null.
# I.e. flagged by `1`
(df.notnull().cummax(1) & df.isnull()).astype(int).diff(axis=1).fillna(0)
#     A    B    C    D    E
#0  0.0  0.0  1.0  0.0 -1.0
#1  0.0  0.0  1.0  0.0 -1.0
#2  0.0  0.0  0.0  0.0  0.0
#3  0.0  0.0  1.0  0.0 -1.0
#4  0.0  1.0 -1.0  1.0 -1.0
#5  0.0  0.0  0.0  1.0 -1.0
#6  0.0  0.0  0.0  1.0  0.0
#7  0.0  1.0 -1.0  1.0  0.0

# The update DataFrame is a like-indexed DF with 0s where they get filled.
# The reversed cummin ensures fills only if there's a non-null value after the 0.
m.where(m.eq(1) & m.loc[:, ::-1].cummin(1).eq(-1)).replace(1, 0)
#    A    B    C    D   E
#0 NaN  NaN  0.0  NaN NaN
#1 NaN  NaN  0.0  NaN NaN
#2 NaN  NaN  NaN  NaN NaN
#3 NaN  NaN  0.0  NaN NaN
#4 NaN  0.0  NaN  0.0 NaN
#5 NaN  NaN  NaN  0.0 NaN
#6 NaN  NaN  NaN  NaN NaN
#7 NaN  0.0  NaN  NaN NaN

Pandas 将 NaN 从零插入到下一个有效值

Pandas interpolate NaNs from zero to next valid value

python

interpolation

dataframe

pandas

示例数据

代码