在 pandas 数据框中插入多列而不是现有的一列

Question

我有一个问题 - 如何在现有列的位置向 DataFrame 插入多个（例如 3 个）列？换句话说，我有一个包含一些分类值的列，我用单热编码对其进行了编码——结果，我获得了 3 个新列。现在，我想删除原始列并在其位置（而不是数据框的末尾）插入结果列。关于如何有效地做到这一点的任何想法？我将不胜感激任何帮助。

**df1 - Original datafarme** :

   col1 col2  col3
0   4    A    0.5
1   5    B    0.78
2   6    C    0.55
3   7    A    0.78

**df2 - created one-hot encoding of categorical col2** :

   col2_A col2_B  col2_C
0   1       0       0
1   0       1       0
2   0       0       1
3   1       0       0

如何将df2的列插入df1，而不是col2得到：

**Updated df1**

   col1 col2_A col2_b col2_C  col3
0   4    1      0        0    0.5
1   5    0      1        0    0.78
2   6    0      0        1    0.55
3   7    1      0        0    0.78

Answer 1

使用

df_concat = pd.concat([df1, df2], axis=1)

然后删除 col_2 使用

df_concat.drop(['col_2'], axis = 1)

Answer 2

df2 中任何新列的解决方案（不需要以 col2 开头）

使用 Index.get_loc for positions, so possible filter by positions with DataFrame.iloc values before and after column and then join together in concat，如有必要，最后删除列：

val = 'col2'
p = df.columns.get_loc(val)

#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[val])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78

如果需要前缀：

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[[val]])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

或使用DataFrame.pop in get_dummies或其他解决方案：

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df.pop(val))
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78

Answer 3

如果索引正确对齐，请使用 DataFrame.join. DataFrame.drop to remove col2 and DataFrame.sort_index 对列进行排序

df1.join(df2).drop(columns = 'col2').sort_index(axis = 1)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

我们也可以用DataFrame.pivot_table代替pd.get_dummies

new_df = (df1.join(df1.pivot_table(columns = 'col2',
                                   index = df1.index,
                                   aggfunc = 'size',
                                   fill_value = 0)
                      .add_prefix('col2_'))
              .drop(columns = 'col2')
              .sort_index(axis = 1))
print(new_df)
   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

在 pandas 数据框中插入多列而不是现有的一列

Insert several columns instead of one existing to pandas dataframe

python

machine-learning

dataframe

pandas

one-hot-encoding