从聚合数据框中删除异常值 (Python)

Question

我的原始数据框看起来像这样，只有第一行...:[=16=]

  categories  id products 
0          A   1       a       
1          B   1       a       
2          C   1       a       
3          A   1       b       
4          B   1       b       
5          A   2       c      
6          B   2       c

我用下面的代码聚合了它：

df2 = df.groupby('id').products.nunique().reset_index().merge(
pd.crosstab(df.id, df.categories).reset_index()

数据框如下，我也从我的 DF 中添加了 n 个离群值：

    id products A B C
0    1       2  2 2 1    
1    2       1  1 1 0    
2    3      50  1 1 30

现在我正在尝试删除新 DF 中的异常值：

#remove outliners
del df2['id']
df2 = df2.loc[df2['products']<=20,[str(i) for i in df2.columns]]

然后我得到的是：

  products  A    B   C
0    2      NaN NaN NaN
1    1      NaN NaN NaN

它删除了异常值，但为什么我现在在类别列中只得到 NaN？

Answer 1

df2 = df2.loc[df2['products'] <= 20]

从聚合数据框中删除异常值 (Python)

Remove outliers from aggregated Dataframe (Python)

python

aggregate

outliers

dataframe

pandas