Python 如果满足 NaN 阈值，则从 DF 中删除所有特征实例

Question

使用 df.dropna(thresh = x, inplace=True)，我可以成功删除至少缺少 x 个非 nan 值的行。

但是因为我的 df 看起来像：

          2001     2002     2003    2004

bob   A   123      31       4        12
bob   B   41        1       56       13
bob   C   nan      nan      4        nan

bill  A   451      8        nan      24
bill  B   32       5        52        6
bill  C   623      12       41       14

#Repeating features (A,B,C) for each index/name

这将删除满足 thresh= 条件的 一个 row/instance，但 保留该功能的其他实例。

What I want is something that drops the entire feature, if the thresh is met for any one row, such as:

df.dropna(thresh = 2, inplace=True):

           2001     2002     2003    2004

bob    A    123      31       4        12
bob    B    41        1       56       13

bill   A    451      8        nan      24
bill   B    32       5        52        6

#Drops C from the whole df

其中C是从整个df中去掉，而不只是在bob

下满足条件的那一次

Answer 1

您的示例看起来像一个多索引索引数据框，其中索引级别 1 是特征 A, B, C，索引级别 0 是名称。您可以使用 notna 和 sum 创建一个掩码来识别非 nan 值的数量小于 2 的行并获取它们的索引级别 1 值。最后，使用 df.query 对行

进行切片

a = df.notna().sum(1).lt(2).loc[lambda x: x].index.get_level_values(1)
df_final = df.query('ilevel_1 not in @a')

Out[275]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0

方法二:
使用 notna、sum、groupby 和 transform 在非 nan 值大于或等于 2 的组上创建掩码 True。最后，使用此掩码切片行

m = df.notna().sum(1).groupby(level=1).transform(lambda x: x.ge(2).all())
df_final = df[m]

Out[296]:
         2001  2002  2003  2004
bob  A  123.0  31.0   4.0  12.0
     B   41.0   1.0  56.0  13.0
bill A  451.0   8.0   NaN  24.0
     B   32.0   5.0  52.0   6.0

Answer 2

只保留至少有 5 个非 NA 值的行。

df.dropna(thresh=5)

thresh 用于包含具有最小非 NaN 数的行

Python 如果满足 NaN 阈值，则从 DF 中删除所有特征实例

Python Drop all instances of Feature from DF if NaN thresh is met

python

nan

dataframe

pandas

What I want is something that drops the entire feature, if the `thresh` is met for any one row, such as:

Python 如果满足 NaN 阈值，则从 DF 中删除所有特征实例

Python Drop all instances of Feature from DF if NaN thresh is met

python

nan

dataframe

pandas

What I want is something that drops the entire feature, if the thresh is met for any one row, such as:

What I want is something that drops the entire feature, if the `thresh` is met for any one row, such as: