如何冻结 Python pandas 数据帧中 NaN 之间序列中的第一个数字

How to freeze first numbers in sequences between NaNs in Python pandas dataframe

有没有一种 Pythonic 的方法,在时间序列数据框中,按列向下选择序列中的第一个数字,然后将其向前推直到下一个 NaN,然后​​取下一个非 NaN 数字并将那个向下推直到下一个 NaN,依此类推(保留索引和 NaN)。

例如,我想转换这个数据帧:

DF = pd.DataFrame(data={'A':[np.nan,1,3,5,7,np.nan,2,4,6,np.nan], 'B':[8,6,4,np.nan,np.nan,9,7,3,np.nan,3], 'C':[np.nan,np.nan,4,2,6,np.nan,1,5,2,8]})
     A    B    C
0  NaN  8.0  NaN
1  1.0  6.0  NaN
2  3.0  4.0  4.0
3  5.0  NaN  2.0
4  7.0  NaN  6.0
5  NaN  9.0  NaN
6  2.0  7.0  1.0
7  4.0  3.0  5.0
8  6.0  NaN  2.0
9  NaN  3.0  8.0

到这个数据框:

Result = pd.DataFrame(data={'A':[np.nan,1,1,1,1,np.nan,2,2,2,np.nan], 'B':[8,8,8,np.nan,np.nan,9,9,9,np.nan,3], 'C':[np.nan,np.nan,4,4,4,np.nan,1,1,1,1]})
     A    B    C
0  NaN  8.0  NaN
1  1.0  8.0  NaN
2  1.0  8.0  4.0
3  1.0  NaN  4.0
4  1.0  NaN  4.0
5  NaN  9.0  NaN
6  2.0  9.0  1.0
7  2.0  9.0  1.0
8  2.0  NaN  1.0
9  NaN  3.0  1.0

我知道我可以使用循环来遍历列来执行此操作,但希望能提供一些帮助,帮助您了解如何在非常大的数据帧上以更高效的 Pythonic 方式执行此操作。谢谢。

IIUC:

# where DF is not NaN
mask = DF.notna()
Result = (DF.shift(-1)           # fill the original NaN's with their next value
            .mask(mask)          # replace all the original non-NaN with NaN
            .ffill()             # forward fill 
            .fillna(DF.iloc[0])  # starting of the the columns with a non-NaN
            .where(mask)         # replace the original NaN's back
         )

输出:

     A    B    C
0  NaN  8.0  NaN
1  1.0  8.0  NaN
2  1.0  8.0  4.0
3  1.0  NaN  4.0
4  1.0  NaN  4.0
5  NaN  9.0  NaN
6  2.0  9.0  1.0
7  2.0  9.0  1.0
8  2.0  NaN  1.0
9  NaN  3.0  1.0