将给定列中具有相同值的 pandas DataFrame（具有许多列）的所有行分组

Question

我一直在寻找 hours.I 有一个像这样的 DataFrame :-

     col1.  col2.   col3.   col4
row1.  a.    p       u       0
row2.  b.    q       v       1
row3.  a.    r       w       2
row4.  d.    s       x       3
row5.  b.    t       y       4

现在我想按 'col1' 的值对所有这些行进行分组，以便我得到 :-

     col1.  col2.   col3.   col4
row1.  a.    p r     u w    0,2
row2.  b.    q t     v y    1,4
row3.  d.    s       x       3

现在我找到了一种方法，其中 df.groupby('col1)['col2'].apply(' '.join()) 将 'col2' 中的所有行按 'col1' 的相同值分组。但是我无法扩展上述命令，使所有行所有列的组合在一起以获得前面提到的输出。

上面的DataFrame仅用于illustration.The实际DataFrame包括大约100行和列，所有单元格都存储反馈，除了col1存储反馈项的名称on.I 想根据相同的项目 (col1) 对所有列进行分组，然后我将对 DataFrame 进行情感分析。

Answer 1

您可以使用：

df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
print (df1)
  col1 col2 col3 col4
0   a.  p,r  u,w  0,2
1   b.  q,t  v,y  1,4
2   d.    s    x    3

如果还需要索引：

df1 = df.astype(str).groupby('col1').agg(','.join).reset_index()
df1.index = df.drop_duplicates('col1').index
print (df1)
      col1 col2 col3 col4
row1.   a.  p,r  u,w  0,2
row2.   b.  q,t  v,y  1,4
row4.   d.    s    x    3

解释:

首先通过 astype

string

然后groupby and aggregate join by agg
如果还需要按 col1 上的第一个值进行索引，请添加 drop_duplicates

将给定列中具有相同值的 pandas DataFrame（具有许多列）的所有行分组

Grouping all rows of a pandas DataFrame(with many columns) with the same value in a given column

python

data-analysis

dataframe

pandas