Return 具有一定百分比的 NaN 值的列 (Python)
Return Column(s) if they Have a certain Percentage of NaN Values (Python)
仅将 return 具有至少 25% NaN 值的列 作为新的 df
I'm thinking either a conditional statement using .loc, .isnull, or count, but I'm not certain what the most efficient method is. Appreciate any and all assistance.
东风:
df1:
(axis 1 = A,B,C = series)
A B C
1 1 2 1
2 NaN NaN 3
3 4 NaN 1
4 2 NaN 4
思考:
df.loc[df['series'] == nan >= 25% ]
或类似的东西:
if count(nan) for column(x) in 'series' is >= (.25 * (count(x)))
return loc[x]
Return 新数据框:
df2:
A B
1 1 2
2 NaN NaN
3 4 NaN
4 2 NaN
Returns A 和 B 因为它们中的每一个都有至少 25% 的列条目为 NaN(缺失)
基于 https://datascience.stackexchange.com/q/12645 的回复。
na_count_mask = df.isna().sum(axis=0) >= (col_count // 4)
res_df = df.loc[na_count_mask]
仅将 return 具有至少 25% NaN 值的列 作为新的 df
I'm thinking either a conditional statement using .loc, .isnull, or count, but I'm not certain what the most efficient method is. Appreciate any and all assistance.
东风:
df1:
(axis 1 = A,B,C = series)
A B C
1 1 2 1
2 NaN NaN 3
3 4 NaN 1
4 2 NaN 4
思考:
df.loc[df['series'] == nan >= 25% ]
或类似的东西:
if count(nan) for column(x) in 'series' is >= (.25 * (count(x)))
return loc[x]
Return 新数据框:
df2:
A B
1 1 2
2 NaN NaN
3 4 NaN
4 2 NaN
Returns A 和 B 因为它们中的每一个都有至少 25% 的列条目为 NaN(缺失)
基于 https://datascience.stackexchange.com/q/12645 的回复。
na_count_mask = df.isna().sum(axis=0) >= (col_count // 4)
res_df = df.loc[na_count_mask]