根据 python 中列的标签创建 groupby

Question

我有一个类似这样的大数据框：

id        price             status
1           23               none
2           23               none
3           34               none
4           32               none
5           31               none
6           37               none
7           20               none
8           29               none
9           21               none
10          22               done

我想根据状态进行分组。我想做一个小组，情况是这样的：状态每次done为一组

到目前为止，我所做的是根据索引进行分组：

grouper = df.groupby(df.index // 10)

但后来我意识到状态是随机写入的，并不总是每 10 行写入一次。

如何在 python 中完成？谢谢

Answer 1

比较 done 值和 cretae 组按 iloc[::-1] 的累积和，最后添加另一个 iloc[::-1] 用于原始列顺序：

g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
grouper = df.groupby(g, sort=False)

样本:

#chnaged data for more groups   
print (df)
   id  price status
0   1     23   none
1   2     23   done
2   3     34   none
3   4     32   none
4   5     31   done
5   6     37   none
6   7     20   none
7   8     29   none
8   9     21   none
9  10     22   done

g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
print (g)
0    3
1    3
2    2
3    2
4    2
5    1
6    1
7    1
8    1
9    1
Name: status, dtype: int32

grouper = df.groupby(g, sort=False)

for name, df in grouper:
    print (df)

   id  price status
0   1     23   none
1   2     23   done
   id  price status
2   3     34   none
3   4     32   none
4   5     31   done
   id  price status
5   6     37   none
6   7     20   none
7   8     29   none
8   9     21   none
9  10     22   done

根据 python 中列的标签创建 groupby

create groupby based on label of a column in python

python

statistics

frequency

rows

pandas