在 python 中删除多个 excel 文件的索引列

Question

我有多个具有相同列名的 excel sheet。当我保存以前计算的文件时，我忘记将“日期”设置为索引，现在所有这些 (40) 都有索引列，数字从 1-200 不等。如果我将它们加载到 python 中，它们将再次获得一个额外的索引列，从而产生 2 个未命名的列。我知道我可以使用 glob 函数来访问我的所有文件。但是有没有办法我可以访问所有文件，drop/delete 未命名的索引列并将新索引设置为日期列

现在是 1 excel sheet 的示例

df = pd.DataFrame({
'': [0, 1,2,3,4],
'Date': [1930, 1931, 1932, 1933,1934],
'value': [11558522, 12323552, 13770958, 18412280, 13770958],
})

Answer 1

dfs = [pd.read_csv(文件).set_index('Date')[['value']] glob.glob("/your/path/to/folder/*.csv")]

Answer 2

使用 pandas 的快速方法是：

>>> df = df.drop('', axis=1)
>>> df = df.set_index('Date')
>>> df
         value
Date          
1930  11558522
1931  12323552
1932  13770958
1933  18412280
1934  13770958

（我是根据记忆完成上述操作的，但是对于这类事情的一般提示是在文档中查找适当的功能。https://pandas.pydata.org/pandas-docs/stable/reference/frame.html）

您还可以在加载文件时指定 header 列：

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

index_colint, list of int, default None

Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column. If a list is passed, those columns will be combined into a MultiIndex. If a subset of data is selected with usecols, index_col is based on the subset.

Answer 3

我认为最简单的是将错误的第一列设置为索引，然后使用 DataFrame.set_index 将它们重写为 Date 列：

import glob, os

for file in glob.glob('subset/*.xlsx'):

    df = pd.read_excel(file, index_col=[0]).set_index('Date')
    print (df)

    #new excel files

    h,t = os.path.split(file)
    df.to_excel(os.path.join(h, 'new_' + t))

    #overwrite excel files (first backup data if something failed for avoid lost data)
    #df.to_excel(file)

在 python 中删除多个 excel 文件的索引列

Dropping index column of mutiple excel files in python

python

dataframe

pandas

datetimeindex