如何将分类列分组，然后是数字列，并基于该组对数值进行分箱

Question

我有一个数据集，其中 "Type" 列基本上是形状，对应于此，"Volume" 列包含该形状的体积

现在我需要完成以下任务：

按形状分组
并且对于每个形状，按体积分组
并且对于每个形状和体积，定义一个范围并形成 bins

输入：

 Type             Volume

 Cylinder          100
 Square            300
 Cylinder          200
 Oval              100
 Square            320
 Cylinder          150
 Oval              600
 Round             1000
 Square            900
 Round             1500

输出：

 Type              Volume       Bin

 Cylinder          100            1
 Cylinder          150            1
 Cylinder          200            2
 Oval              100            1
 Oval              600            3
 Round             1000           1
 Round             1500           2
 Square            300            1
 Square            320            1
 Square            900            3

垃圾箱如下：

1.Cylinder -> Bin1(100-200), Bin2(201-300) ....

2.Oval -> Bin1(100-200), ..... Bin3(500-600).... ....

代码：

  grouped=df_dim.groupby('Type', as_index=False)
  def test(group):
     return group.reset_index()
  def group_vol(group):
     groupedVol = 
         group.groupby(pd.cut(group["Target_BrimVol"],
         np.arange(0,5000,200)),as_index=False)

     return groupedVol.apply(test)

  gr = grouped.apply(group_vol)
  print(gr)

Answer 1

我想你可以试试下面的代码。

testdf = df.groupby('Type',as_index=False).apply(lambda x: x.groupby(pd.cut(x["Vol"],np.arange(x["Volume"].min(),x["Volume"].max(),200)),as_index=False).apply(test))

这里发生的事情是，第一个 groupby 基本上将 Dataframe 分组到 "Type" 类别，然后您想根据范围对其进行分组。为此，您可以使用 pd.cut 函数使用 lambda 函数再次对其进行分组，以根据您的范围对间隔进行小幅切割。在这种情况下，我只是取最大值和最小值并以 200 的间隔切割它。在此之后，如果你想再次将输出合并在一起形成一个 Dataframe，再使用一个 apply 将它们合并回来。像这样，

def test(group):
   #Write your function here. Whatever you want to perform.
   return group.merge(group)

我正在使用 as_index=False 在这里重置索引，以便根据新索引重新排列数据框。

希望对您有所帮助。

编辑：- 对于垃圾箱，您不必担心，因为每个 groupby 都会创建一个新索引，您可以将其用于您的目的。如，

Index1  Index2  Type  Volume
0 0 Cylinder  100
0 0 Cylinder  140
0 1 Cylinder  250
1 0 Oval  154
1 4 Oval 999
2 1 Circle  328

如何将分类列分组，然后是数字列，并基于该组对数值进行分箱

How to group categorical column followed by numerical column, and based on this group to bin the numerical values

python

dataframe

python-3.x

sklearn-pandas

pandas-groupby