将具有分类数据的列转换为每个类别的一列

Question

我有一个看起来像这样的 DataFrame：

df  index    id           timestamp   cat  value
0   8066     101  2012-03-01 09:00:29  A      1   
1   8067     101  2012-03-01 09:01:15  B      0   
2   8068     101  2012-03-01 09:40:18  C      1
3   8069     102  2012-03-01 09:40:18  C      0

我想要的是这样的：

df           timestamp           A     B     C     id      value
0        2012-03-01 09:00:29     1     0     0    101        1
1        2012-03-01 09:01:15     0     1     0    101        0
2        2012-03-01 09:40:18     0     0     1    101        1
3        2012-03-01 09:40:18     0     0     1    102        0

正如您在第 2,3 行中看到的，时间戳可以重复。起初我尝试使用数据透视表（以时间戳为索引），但由于这些重复项，它不起作用。我不想丢弃它们，因为其他数据不同，不应丢失。

由于 index 不包含重复项，我想也许我可以将其旋转然后将结果合并到原始 DataFrame 中，但我想知道是否有更简单的方法直观的解决方案。

谢谢！

Answer 1

使用get_dummies.

看这里： http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.reshape.get_dummies.html

此处的 Whosebug 示例： Create dummies from column with multiple values in pandas

Answer 2

这是可以实现您想要的效果的单线。假设您的数据框名为 df

df_new = df.join(pd.get_dummies(df.cat).drop(['index', 'cat'], axis=1)

Answer 3

作为您的 get_dummies returns 一个 df，这将与您现有的 df 对齐，所以只需 concat 按列：

In [66]:

pd.concat([df,pd.get_dummies(df['cat'])], axis=1)

Out[66]:
   index   id            timestamp cat  value  A  B  C
0   8066  101  2012-03-01 09:00:29   A      1  1  0  0
1   8067  101  2012-03-01 09:01:15   B      0  0  1  0
2   8068  101  2012-03-01 09:40:18   C      1  0  0  1
3   8069  102  2012-03-01 09:40:18   C      0  0  0  1

您可以通过执行 df.drop('cat', axis=1)

删除 'cat' 列

将具有分类数据的列转换为每个类别的一列

transform column with categorical data into one column for each category

python

pandas