查询数据框但仅将过滤器应用于列值不是 NaN 的行
Query dataframe but apply filter only to rows where column value is not NaN
我有一个数据框 df:
num1 | count | count_min | count_max
a | 10 | 5 | 10
b | 15 | 6 | 11
c | 3 | NaN | NaN
我想过滤掉不在 count_min 和 count_max 之间的每个计数。
但如果 count_min/count_max 为 NaN,则应保留该行。
最终结果应该是:
num1 | count | count_min | count_max
a | 10 | 5 | 10
c | 3 | NaN | NaN
所以我需要在查询中使用类似 if/else 的东西来检查 count_min/count_max 是否为 NaN,然后再应用过滤器。
如何使用如下查询语法实现此目的:
df = df.query("count >= count_min and count <= count_max")
?
使用Series.between
and Series.isna
:
In [4487]: df = df[df['count'].between(df.count_min, df.count_max) | (df.count_max.isna() | df.count_min.isna())]
In [4487]: df
Out[4487]:
num1 count count_min count_max
0 a 10 5.0 10.0
2 c 3 NaN NaN
对于这种情况,您可以使用 np.where()
并将其用作过滤器:
df[np.where((df['count'].between(df['count_min'].values,df['count_max'].values)) | (df['count_min'].isna()) | (df['count_max'].isna()),True,False)]
像这样...
df = df[(df['count_min'].isna()) | (df['count_min'].isna()) | ((df['count'] >= df['count_min'] & (df['count'] <= df['count_max']))]
我有一个数据框 df:
num1 | count | count_min | count_max
a | 10 | 5 | 10
b | 15 | 6 | 11
c | 3 | NaN | NaN
我想过滤掉不在 count_min 和 count_max 之间的每个计数。
但如果 count_min/count_max 为 NaN,则应保留该行。
最终结果应该是:
num1 | count | count_min | count_max
a | 10 | 5 | 10
c | 3 | NaN | NaN
所以我需要在查询中使用类似 if/else 的东西来检查 count_min/count_max 是否为 NaN,然后再应用过滤器。
如何使用如下查询语法实现此目的:
df = df.query("count >= count_min and count <= count_max")
?
使用Series.between
and Series.isna
:
In [4487]: df = df[df['count'].between(df.count_min, df.count_max) | (df.count_max.isna() | df.count_min.isna())]
In [4487]: df
Out[4487]:
num1 count count_min count_max
0 a 10 5.0 10.0
2 c 3 NaN NaN
对于这种情况,您可以使用 np.where()
并将其用作过滤器:
df[np.where((df['count'].between(df['count_min'].values,df['count_max'].values)) | (df['count_min'].isna()) | (df['count_max'].isna()),True,False)]
像这样...
df = df[(df['count_min'].isna()) | (df['count_min'].isna()) | ((df['count'] >= df['count_min'] & (df['count'] <= df['count_max']))]