如何在 pandas 中同时应用一种热编码或在 2 列上获取虚拟对象？

Question

我有以下数据框，其中包含示例值，例如：-

df = pd.DataFrame([["London", "Cambridge", 20], ["Cambridge", "London", 10], ["Liverpool", "London", 30]], columns= ["city_1", "city_2", "id"])

city_1     city_2        id
London     Cambridge     20
Cambridge  London        10
Liverpool  London        30

我需要如下所示的输出数据帧，它是在将 2 个城市列连接在一起并在之后应用一种热编码时构建的：

id London Cambridge Liverpool
20 1       1        0
10 1       1        0
30 1       0        1

目前，我正在使用下面的代码，它在一个专栏上运行一次，请问是否有任何 pythonic 方法可以得到上面的输出

output_df = pd.get_dummies(df, columns=['city_1', 'city_2'])

这导致

id city_1_Cambridge city_1_London and so on columns

Answer 1

您可以将参数 prefix_sep 和 prefix 添加到 get_dummies，然后如果只需要 1 或 0 值（假人或指标列）或 sum 如果需要计算 1 个值：

output_df = (pd.get_dummies(df, columns=['city_1', 'city_2'], prefix_sep='', prefix='')
               .max(axis=1, level=0))
print (output_df)
   id  Cambridge  Liverpool  London
0  20          1          0       1
1  10          1          0       1
2  30          0          1       1

或者如果想处理所有没有 id 的列，首先通过 DataFrame.set_index, then use get_dummies with max and last add DataFrame.reset_index 将不处理的列转换为索引：

output_df = (pd.get_dummies(df.set_index('id'), prefix_sep='', prefix='')
               .max(axis=1, level=0)
               .reset_index())
print (output_df)
   id  Cambridge  Liverpool  London
0  20          1          0       1
1  10          1          0       1
2  30          0          1       1

如何在 pandas 中同时应用一种热编码或在 2 列上获取虚拟对象？

how to apply one hot encoding or get dummies on 2 columns together in pandas?

pandas

one-hot-encoding