如何在 Pandas 中的一列中找到 Nan 之前的第一个非 NAN 数据

Question

例如，我有一些这样的数据：

column = pd.Series([1,2,3,np.nan,4,np.nan,7])
print column

执行命令，结果如下：

现在我想知道每个 NaN 值之前的第一个值是什么，例如第一个 NaN 之前的 3.0。而 4.0 是第二个 NaN 值之前的结果。 pandas 中是否有任何内置函数可以完成此操作，或者我应该编写一个 for 循环来完成此操作吗？

Answer 1

处理非连续 NaNs 的解决方案。
您可以使用 boolean indexing with mask created by isnull, shift and fillna:

print (column[column.isnull().shift(-1).fillna(False)])
2    3.0
4    4.0
dtype: float64

print (column.isnull())
0    False
1    False
2    False
3     True
4    False
5     True
6    False
dtype: bool

print (column.isnull().shift(-1))
0    False
1    False
2     True
3    False
4     True
5    False
6      NaN
dtype: object

print (column.isnull().shift(-1).fillna(False))
0    False
1    False
2     True
3    False
4     True
5    False
6    False
dtype: bool

连续 NaN 需要通过 mul 反转 c 的倍数：

column = pd.Series([np.nan,2,3,np.nan,np.nan,np.nan,7,np.nan, np.nan, 5,np.nan])

c = column.isnull()
mask = c.shift(-1).fillna(False).mul(~c)
print (mask)
0     False
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10    False
dtype: bool

print (column[mask])
2    3.0
6    7.0
9    5.0
dtype: float64

Answer 2

与@jezrael 相同的想法...numpy已验证。

column[np.append(np.isnan(column.values)[1:], False)]

2    3.0
4    4.0
dtype: float64

完成pd.Series重建

m = np.append(np.isnan(column.values)[1:], False)
pd.Series(column.values[m], column.index[m])

2    3.0
4    4.0
dtype: float64

没有那么快但直观。按 isnull 的 cumsum 分组并取最后一个值。对于这个结果，去掉最后一行。

column.groupby(column.isnull().cumsum()).last().iloc[:-1]

0    3.0
1    4.0
dtype: float64

如何在 Pandas 中的一列中找到 Nan 之前的第一个非 NAN 数据

How to find the first Non-NAN data before Nan in one column in Pandas

nan

python-2.7

pandas