将 pandas.DataFrame 添加到现有 Excel 文件

Adding a pandas.DataFrame to Existing Excel File

我有一个网络抓取工具,可以为本月的抓取创建一个 excel 文件。我想将今天的刮擦和那个月的每个刮擦作为一个新的 sheet 添加到那个文件中,每次它是 运行。然而,我的问题是它只用新的 sheet 覆盖现有的 sheet,而不是将其作为单独的新 sheet 添加。我已经尝试使用 xlrd、xlwt、pandas 和 openpyxl 来做到这一点。

对于 Python 来说仍然是全新的,因此非常感谢简单!

下面只是处理写入 excel 文件的代码。

# My relevant time variables
ts = time.time()
date_time = datetime.datetime.fromtimestamp(ts).strftime('%y-%m-%d %H_%M_%S')
HourMinuteSecond = datetime.datetime.fromtimestamp(ts).strftime('%H_%M_%S')
month = datetime.datetime.now().strftime('%m-%y')

# Creates a writer for this month and year
writer = pd.ExcelWriter(
    'C:\Users\G\Desktop\KickstarterLinks(%s).xlsx' % (month), 
    engine='xlsxwriter')

# Creates dataframe from my data, d
df = pd.DataFrame(d)

# Writes to the excel file
df.to_excel(writer, sheet_name='%s' % (HourMinuteSecond))
writer.save()

更新:

此功能已添加到 pandas 0.24.0

ExcelWriter now accepts mode as a keyword argument, enabling append to existing workbooks when using the openpyxl engine (GH3441)

以前的版本:

Pandas 对此有一个 open feature request

与此同时,这里有一个向现有工作簿添加 pandas.DataFrame 的函数:

代码:

def add_frame_to_workbook(filename, tabname, dataframe, timestamp):
    """
    Save a dataframe to a workbook tab with the filename and tabname
    coded to timestamp

    :param filename: filename to create, can use strptime formatting
    :param tabname: tabname to create, can use strptime formatting
    :param dataframe: dataframe to save to workbook
    :param timestamp: timestamp associated with dataframe
    :return: None
    """
    filename = timestamp.strftime(filename)
    sheet_name = timestamp.strftime(tabname)

    # create a writer for this month and year
    writer = pd.ExcelWriter(filename, engine='openpyxl')
    
    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)
        
        # copy existing sheets
        writer.sheets = dict(
            (ws.title, ws) for ws in writer.book.worksheets)
    except IOError:
        # file does not exist yet, we will create it
        pass

    # write out the new sheet
    dataframe.to_excel(writer, sheet_name=sheet_name)
    
    # save the workbook
    writer.save()

测试代码:

import datetime as dt
import pandas as pd
from openpyxl import load_workbook

data = [x.strip().split() for x in """
                   Date  Close
    2016-10-18T13:44:59  2128.00
    2016-10-18T13:59:59  2128.75
""".split('\n')[1:-1]]
df = pd.DataFrame(data=data[1:], columns=data[0])

name_template = './sample-%m-%y.xlsx'
tab_template = '%d_%H_%M'
now = dt.datetime.now()
in_an_hour = now + dt.timedelta(hours=1)
add_frame_to_workbook(name_template, tab_template, df, now)
add_frame_to_workbook(name_template, tab_template, df, in_an_hour)

(Source)