根据特定列将 excel 个文件合并为一个

Question

我需要根据特定列合并多个 excel 文件，因为每个文件都有两列 id 和 value，我需要合并所有 values从所有文件合并到一个文件中。我试过这段代码但合并了所有列

cwd = os.path.abspath('/path/')     
files = os.listdir(cwd)      
df = pd.DataFrame()
for file in files:
   if file.endswith('.xlsx'):
       df = df.append(pd.read_excel('/path/' + file), ignore_index=True) 
df.head() 
df.to_excel('/path/merged.xlsx')

但将所有值都放入单个列中，如

 1   504.0303111
 2  1587.678968
 3   1437.759643
 4   1588.387983 
 5   1059.194416 
 1   642.4925851
 2   459.3774304   
 3  1184.210851 
 4   1660.24336
 5   1321.414708

我需要像

这样存储的值

  1  504.0303111  1 670.9609316     
  2  1587.678968  2 459.3774304     
  3  1437.759643  3 1184.210851     
  4  1588.387983  4 1660.24336      
  5  1059.194416  5 1321.414708

Answer 1

一种方法是将 DataFrame 附加到循环中的列表并在循环后沿着列连接：

cwd = os.path.abspath('/path/')
files = os.listdir(cwd)

tmp = []
for i, file in enumerate(files[1:], 1):
    if file.endswith('.xlsx'):
        tmp.append(pd.read_excel('/path/' + file))
df = pd.concat(tmp, axis=1)
df.to_excel('/path/merged.xlsx')

但我觉得下面的代码更适合您，因为它不会复制 id 列，只会将值列作为新列添加到循环中的 DataFrame df 中：

cwd = os.path.abspath('/path/')
files = [file for file in os.listdir(cwd) if file.endswith('.xlsx')]
df = pd.read_excel('/path/' + files[0])

for i, file in enumerate(files[1:], 1):
    df[f'value{i}'] = pd.read_excel('/path/' + file).iloc[:, 1]

df.to_excel('/path/merged.xlsx')

根据特定列将 excel 个文件合并为一个

merge excel files into one based on specific columns

python

excel

dataframe

pandas