如何在 python 的列中过滤所有包含 ''isolated'' nan 值的行
How to filter all the rows that contain ''isolated'' nan values in a column in python
我在 pandas 数据框中有一列,其中一些行具有 NaN 值。
我想要 select 满足这些条件的行 :
- 它们是 NaN 值;
- 它们直接跟在后面或在非空值之前
例如,我想 select 具有此 nan 值的行:
输入:
索引 |上校
...
1 | 1344
2 |南
3 | 532
...
期望的输出:
2 | NaN
但我不想 select 这些 nan 值(因为它们后跟一个 NaN 值或紧跟在另一个 NaN 值之后):
索引 |上校
...
1 | 1344
2 |南
3 |南
4 | 532
...
如有任何帮助,我们将不胜感激
谢谢!
下面我将向您展示如何用 example.On 一方面,Series.notna
+ Series.cumsum
+ Series.shift
is used to group consecutive NaN
values through groupby
. Using transform
you get a Boolean Series with False
in those groups that have more than one NaN
. the AND
operation of this Boolean series with the resulting series of df2['col2']. isna()
is the series we are looking for to perform the Boolean indexing
和 select 那些有 NaN 但不是连续
的行
df=pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10],'col2':[np.nan,2,3,np.nan,np.nan,6,np.nan,8,9,np.nan]})
print(df)
col1 col2
0 1 NaN
1 2 2.0
2 3 3.0
3 4 NaN
4 5 NaN
5 6 6.0
6 7 NaN
7 8 8.0
8 9 9.0
9 10 NaN
mask_repeat_NaN=df.groupby(df['col2'].notna().cumsum())['col2'].transform('size').le(2)
mask=mask_repeat_NaN&df['col2'].isna()
df_filtered=df[mask]
print(df_filtered)
col1 col2
0 1 NaN
6 7 NaN
9 10 NaN
我在 pandas 数据框中有一列,其中一些行具有 NaN 值。
我想要 select 满足这些条件的行 :
- 它们是 NaN 值;
- 它们直接跟在后面或在非空值之前
例如,我想 select 具有此 nan 值的行:
输入:
索引 |上校
...
1 | 1344
2 |南
3 | 532
...
期望的输出:
2 | NaN
但我不想 select 这些 nan 值(因为它们后跟一个 NaN 值或紧跟在另一个 NaN 值之后):
索引 |上校
...
1 | 1344
2 |南
3 |南
4 | 532
...
如有任何帮助,我们将不胜感激
谢谢!
下面我将向您展示如何用 example.On 一方面,Series.notna
+ Series.cumsum
+ Series.shift
is used to group consecutive NaN
values through groupby
. Using transform
you get a Boolean Series with False
in those groups that have more than one NaN
. the AND
operation of this Boolean series with the resulting series of df2['col2']. isna()
is the series we are looking for to perform the Boolean indexing
和 select 那些有 NaN 但不是连续
df=pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10],'col2':[np.nan,2,3,np.nan,np.nan,6,np.nan,8,9,np.nan]})
print(df)
col1 col2
0 1 NaN
1 2 2.0
2 3 3.0
3 4 NaN
4 5 NaN
5 6 6.0
6 7 NaN
7 8 8.0
8 9 9.0
9 10 NaN
mask_repeat_NaN=df.groupby(df['col2'].notna().cumsum())['col2'].transform('size').le(2)
mask=mask_repeat_NaN&df['col2'].isna()
df_filtered=df[mask]
print(df_filtered)
col1 col2
0 1 NaN
6 7 NaN
9 10 NaN