为单个 Pandas 列中的值创建虚拟列并将其分组为单行

Question

我正在尝试采用 pandas 数据框并在单个列上执行类似数据透视的操作。我想获取多行（按一些标识列分组）并将该单列转换为虚拟指标变量。我知道 pd.get_dummies() 但我想将多行压缩成一行。

下面的例子：

import pandas as pd
import numpy as np

# starting data
d = {'ID': [1,1,1,2,2,3,3,3], 
     'name': ['bob','bob','bob','shelby','shelby','jordan','jordan','jordan'],
     'type': ['type1','type2','type4','type1','type6','type5','type8','type2']}
df: pd.DataFrame = pd.DataFrame(data=d)
print(df.head(9))

   ID    name   type
0   1     bob  type1
1   1     bob  type2
2   1     bob  type4
3   2  shelby  type1
4   2  shelby  type6
5   3  jordan  type5
6   3  jordan  type8
7   3  jordan  type2

我希望最终结果如下所示：

   ID    name  type1  type2  type4  type5  type6  type8
0   1     bob      1      1      1      0      0      0
1   2  shelby      1      0      0      0      1      0
2   3  jordan      0      1      0      1      0      1

Answer 1

可以使用pandas.DataFrame.pivot_table方法(documentation here)

df.pivot_table(index=['ID'], columns=['type'], aggfunc='count', fill_value=0)

产出

type type1 type2 type4 type5 type6 type8
ID                                      
1        1     1     1     0     0     0
2        1     0     0     0     1     0
3        0     1     0     1     0     1

我无法通过单行方法调用获得所需的输出，您需要合并两个数据帧并保留所需的列。

您会注意到 pivot_table 方法 returns 一个数据帧，其中列 ID 是 index.

Answer 2

给你！

pd.get_dummies(df, columns=['type'], prefix='', prefix_sep='').groupby(['ID','name']).max().reset_index()

prefix='' 和 prefix_sep='' 避免在列名称中添加额外的前缀，例如 type_type1、type_type2。
groupby(this_columns) 允许您通过 this_columns.
max() on groupby() 将为您提供最大的聚合值，例如，对于 bob，您将有 type1 个值 1,0,0以虚拟格式，但如果其中任何一个为 1，则您希望它为 1，因此 max() 在这里有效。
reset_index() 返回您的列 ID 和 name，它们由 groupby().

为单个 Pandas 列中的值创建虚拟列并将其分组为单行

Create Dummy Columns for values in Single Pandas Column and Group into single row

python

dataframe

pandas

dummy-variable

pandas-groupby