Pandas Concat 增加行数

Question

我正在连接两个数据帧，所以我想将一个数据帧定位到另一个数据帧。但首先我对初始数据框做了一些转换：

scaler = MinMaxScaler() 
real_data = pd.DataFrame(scaler.fit_transform(df[real_columns]), columns = real_columns)

然后连接：

categorial_data  = pd.get_dummies(df[categor_columns], prefix_sep= '__')
train = pd.concat([real_data, categorial_data], axis=1, ignore_index=True)

我不知道为什么，但行数增加了：

print(df.shape, real_data.shape, categorial_data.shape, train.shape)
(1700645, 23) (1700645, 16) (1700645, 130) (1703915, 146)

发生了什么以及如何解决问题？

如您所见，列车的列数等于 real_data 和 categorial_data

列的总和

Answer 1

我使用 hstack

解决了这个问题

train = pd.DataFrame(np.hstack([real_data,categorial_data]))

Answer 2

问题是，有时当您对单个数据框对象执行多个操作时，索引会保留在内存中。所以使用 df.reset_index() 将解决你的问题。

Answer 3

当连接的数据帧的索引不同时会发生这种情况。预处理后，结果数据帧的索引被删除。将每个数据帧的索引设置回原始作品，即 df_concatenated.index = df_original.index.

Answer 4

在对数据帧执行一些操作时，其尺寸不会改变索引，因此我们需要对数据帧执行reset_index操作。

对于串联，您可以这样做：

result_df = pd.concat([first_df.reset_index(drop=True), second_df.reset_index(drop=True)], axis=1)

Pandas Concat increases number of rows