为什么 pandas.concat() 添加 (), 到列名

Why does pandas.concat() add (), to column name

我想弄清楚为什么 pandas.concat() 的列名称在括号中。

有一个类似的问题 - 但在我的上下文中,我不明白这是怎么回事。好像作业中有一个双括号,但考虑到连接的数据框看起来不错,我不明白是什么原因造成的。

输出在代码下方。

import warnings
import random
import pandas as pd # dataframe manipulation
import numpy as np # linear algebra
from sklearn.preprocessing import OneHotEncoder
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

url = 'https://raw.githubusercontent.com/bryonbaker/datasets/main/SIT720/Ass4/forestfires.csv'
full_df = pd.read_csv(url)
print(f"{full_df.head()}\n")

ohe = OneHotEncoder(handle_unknown='ignore', drop=None, dtype='int')

transformed = ohe.fit_transform(full_df[['month']])
month_df = pd.DataFrame(transformed.toarray())
month_df.columns = ohe.categories_

print(month_df.head())

full_df = full_df.drop(['month'], axis=1)

result = pd.concat([full_df, month_df], axis=1)
result.head()

完整的输出是:

   X  Y month  day  FFMC   DMC     DC  ISI  temp  RH  wind  rain  area
0  7  5   mar  fri  86.2  26.2   94.3  5.1   8.2  51   6.7   0.0   0.0
1  7  4   oct  tue  90.6  35.4  669.1  6.7  18.0  33   0.9   0.0   0.0
2  7  4   oct  sat  90.6  43.7  686.9  6.7  14.6  33   1.3   0.0   0.0
3  8  6   mar  fri  91.7  33.3   77.5  9.0   8.3  97   4.0   0.2   0.0
4  8  6   mar  sun  89.3  51.3  102.2  9.6  11.4  99   1.8   0.0   0.0

  apr aug dec feb jan jul jun mar may nov oct sep
0   0   0   0   0   0   0   0   1   0   0   0   0
1   0   0   0   0   0   0   0   0   0   0   1   0
2   0   0   0   0   0   0   0   0   0   0   1   0
3   0   0   0   0   0   0   0   1   0   0   0   0
4   0   0   0   0   0   0   0   1   0   0   0   0
X   Y   day FFMC    DMC DC  ISI temp    RH  wind    ... (dec,)  (feb,)  (jan,)  (jul,)  (jun,)  (mar,)  (may,)  (nov,)  (oct,)  (sep,)
0   7   5   fri 86.2    26.2    94.3    5.1 8.2 51  6.7 ... 0   0   0   0   0   1   0   0   0   0
1   7   4   tue 90.6    35.4    669.1   6.7 18.0    33  0.9 ... 0   0   0   0   0   0   0   0   1   0
2   7   4   sat 90.6    43.7    686.9   6.7 14.6    33  1.3 ... 0   0   0   0   0   0   0   0   1   0
3   8   6   fri 91.7    33.3    77.5    9.0 8.3 97  4.0 ... 0   0   0   0   0   1   0   0   0   0
4   8   6   sun 89.3    51.3    102.2   9.6 11.4    99  1.8 ... 0   0   0   0   0   1   0   0   0   0
5 rows × 24 columns

类别存储在数组列表中。当您将它们设为列名时,每个名称都会变成一个 one-element 元组。更改此行:

month_df.columns = ohe.categories_

至:

month_df.columns = ohe.categories_[0]