如何重命名 pandas 中数据帧列表的列？

Question

我有数据框列表，每个数据框都有不同的列，我想为所有列分配唯一的列名并将其组合，但它不起作用。在 pandas?

中有什么快速的方法可以做到这一点吗？

我的尝试

!pip install wget

import wget
import pandas as pd

url = 'https://github.com/adamFlyn/test_rl/blob/main/test_data.xlsx'
data= wget.download(url)

xls = pd.ExcelFile('~/test_data.xlsx')
names = xls.sheet_names[1:]
# iterate to find sheet name that matches 
data_dict = pd.read_excel(xls, sheet_name = [name for name in xls.sheet_names if name in names])

dfs=[]
for key, val in data_dict.items():
    val['state_abbr'] = key
    dfs.append(val)

for df in dfs:
    st=df.columns[0]
    df['state']=st
    df.reset_index()


for df in dfs:
    lst=df.columns.tolist()
    lst=['county','orientation','state_abbr','state']
    df.columns=lst

final_df=pd.concat(dfs, axis=1, inplace=True)

但我无法像这样重命名每个数据框的列并出现此错误：

for df in dfs:
    lst=df.columns.tolist()
    lst=['county','orientation','state_abbr','state']
    df.columns=lst

ValueError: Length mismatch: Expected axis has 5 elements, new values have 4 elements

我应该如何在 pandas 中执行此操作？任何快速的想法或技巧来做到这一点？谢谢

Answer 1

你应该将 df.columns=lst 移出循环

    for df in dfs:
        lst=df.columns.tolist()
        lst=['county','orientation','state_abbr','state']
    df.columns=lst

Answer 2

错误来自数据。几乎所有的 DataFrame sheet 都有 3 列，但只有“NC”有一个以“Unnamed”开头的冗余列，除了一行的值是 "`" 之外，它几乎都是 NaN。如果我们从 sheet 中删除该列，其余代码将按预期工作。

您可以在字典理解中使用 assign 分配新列并使用 set_axis 更改列名。此外，您可以使用 names 本身，而不是列表理解来获取 sheet 名称。最后，只需将所有内容与 concat.

连接起来

out = pd.concat([df.loc[:, ~df.columns.str.startswith('Unnamed')]
                 .set_axis(['county','orientation'], axis=1)
                 .assign(state=df.columns[0], state_abbr=k)
                 for k, df in pd.read_excel(xls, sheet_name = names).items()])

输出：

            county orientation     state state_abbr
0   Aleutians East  Plaintiff     Alaska         AK
1   Aleutians West  Plaintiff     Alaska         AK
2        Anchorage     Neutral    Alaska         AK
3           Bethel  Plaintiff     Alaska         AK
4      Bristol Bay  Plaintiff     Alaska         AK
..             ...         ...       ...        ...
18      Sweetwater     Neutral  Wyoming          WY
19           Teton     Neutral  Wyoming          WY
20           Uinta    Defense   Wyoming          WY
21        Washakie    Defense   Wyoming          WY
22          Weston     Defense  Wyoming          WY

[3117 rows x 4 columns]

如何重命名 pandas 中数据帧列表的列？

How to rename columns of list of dataframes in pandas?

python

excel

dataframe

python-3.x

pandas