Openpyxl 在保存时损坏 xlsx。即使没有做出任何改变
Openpyxl corrupts xlsx on save. Even when no changes were made
TL;DR;
- 使用 Openpyxl 将更改保存到大型 excel 文件会导致损坏的 xlsx 文件
- Excel 文件由几个带有图形、公式、图像和表格的选项卡组成。
- Powershell 脚本可以毫无问题地将编辑保存到 xlsx 文件。
- 我可以使用 Openpyxl 从 excel 文件中读取单元格值,我还可以手动编辑和保存 xlsx 文件。
- Excel 文件未受保护。
- 下面提供了所有错误和代码片段。
我无法将数据添加到我们的一个团队正在使用的 excel 文件中。 excel 文件相当大(+3MB),有几个 sheet,包含公式和图表,还有图像。
谢天谢地,sheet 我需要输入数据以包含 none,但是,我发现当我尝试保存工作簿时,我最终遇到了这些错误:
Traceback (most recent call last):
File "test.py", line 5, in <module>
wb.save("new.xlsx")
File "C:\Python3\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 78, in write_data
self._write_charts()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 124, in _write_charts
self._archive.writestr(chart.path[1:], tostring(chart._write()))
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 134, in _write
return cs.to_tree()
File "C:\Python3\lib\site-packages\openpyxl\chart\chartspace.py", line 193, in to_tree
tree = super(ChartSpace, self).to_tree()
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\chart\plotarea.py", line 135, in to_tree
return super(PlotArea, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 105, in to_tree
el = v.to_tree(namespace=namespace)
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 107, in to_tree
return super(ChartBase, self).to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 39, in to_tree
el = v.to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\chart\series.py", line 170, in to_tree
return super(Series, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
AttributeError: 'str' object has no attribute 'to_tree'
这是我用来执行“另存为”过程的代码,所以不要介意添加数据等等,保存操作会损坏文件:
from openpyxl import load_workbook
wb=load_workbook("Production Monitoring Script.xlsx")
ws=wb['Prod Perf Script Data']
wb.save("new.xlsx")
我尝试了一个使用 Powershell 的替代解决方案,它奏效了。
$xl=New-Object -ComObject Excel.Application
$wb=$xl.WorkBooks.Open('<path here>\Production Monitoring Script.xlsx')
$ws=$wb.WorkSheets.item(1)
$xl.Visible=$true
$ws.Cells.Item(7, 618)=50
$wb.SaveAs('<path here>\New.xlsx')
$xl.Quit()
能够在该单元格中保存值“50”。
正如在对原始 post 的评论中所讨论的那样:openpyxl 不支持某些图形和其他项目,即使它们位于未被您的代码修改的工作表中。这不是一个完整的解决方法,但当不受支持的objects仅在其他工作表中时有效。
我制作了一个示例 .xlsx 工作簿,其中包含两个工作表,'TWC' 和 'UV240 Results'。此代码假定任何标题以 'Results' 结尾的工作表包含不受支持的图像,并创建两个临时文件 - imageoutput 包含不受支持的图像,outputtemp 包含可以修改而不会被 openpyxl 损坏的工作表。然后在最后将它们缝合在一起。
可能部分效率低下;请编辑或评论改进!
import os
import shutil
import win32com.client
from openpyxl import load_workbook
name = 'spreadsheet.xlsx'
outputfile = 'output.xlsx'
outputtemp = 'outputtemp.xlsx'
shutil.copyfile(name, 'output.xlsx')
wb = load_workbook('output.xlsx')
ws = wb['TWC']
# TWC doesn't have images. Anything ending with 'Results' has unsupported images etc
# Create new file with only openpyxl-unsupported worksheets
imageworksheets = [ws if ws.title.endswith('Results') else '' for ws in wb.worksheets]
if [ws for ws in wb if ws.title != 'TWC']:
imageoutput = 'output2.xlsx'
imagefilewritten = False
while not imagefilewritten:
try:
shutil.copy(name, imageoutput)
except PermissionError as error:
# Catch an exception here - I usually have a GUI function
pass
else:
imagefilewritten = True
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = False
imagewb = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
excel.DisplayAlerts = False
for i, ws in enumerate(imageworksheets[::-1]): # Go backwards to avoid reindexing
if not ws:
wsindex = len(imageworksheets) - i
imagewb.Worksheets(wsindex).Delete()
imagefileupdated = False
while not imagefileupdated:
try:
imagewb.Save()
imagewb.Close(SaveChanges = True)
print('Temp image workbook saved.')
except PermissionError as error:
# Catch exception
pass
else:
imagefileupdated = True
# Remove the unsupported worksheets in openpyxl
for ws in wb.worksheets:
if ws in imageworksheets:
wb.remove(ws)
wb.save(outputtemp)
print('Temp output workbook saved.')
''' Do your desired openpyxl manipulations on the remaining supported worksheet '''
# Merge the outputtemp and imageoutput into outputfile
wb1 = excel.Workbooks.Open(os.path.join(os.getcwd(), outputtemp))
wb2 = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
for ws in wb1.Sheets:
ws.Copy(wb2.Sheets(1))
wb2.SaveAs(os.path.join(os.getcwd(), outputfile))
wb1.Close(SaveChanges = True)
wb2.Close(SaveChanges = True)
print(f'Output workbook saved as {outputfile}.')
excel.Visible = True
excel.DisplayAlerts = True
TL;DR;
- 使用 Openpyxl 将更改保存到大型 excel 文件会导致损坏的 xlsx 文件
- Excel 文件由几个带有图形、公式、图像和表格的选项卡组成。
- Powershell 脚本可以毫无问题地将编辑保存到 xlsx 文件。
- 我可以使用 Openpyxl 从 excel 文件中读取单元格值,我还可以手动编辑和保存 xlsx 文件。
- Excel 文件未受保护。
- 下面提供了所有错误和代码片段。
我无法将数据添加到我们的一个团队正在使用的 excel 文件中。 excel 文件相当大(+3MB),有几个 sheet,包含公式和图表,还有图像。
谢天谢地,sheet 我需要输入数据以包含 none,但是,我发现当我尝试保存工作簿时,我最终遇到了这些错误:
Traceback (most recent call last):
File "test.py", line 5, in <module>
wb.save("new.xlsx")
File "C:\Python3\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 78, in write_data
self._write_charts()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 124, in _write_charts
self._archive.writestr(chart.path[1:], tostring(chart._write()))
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 134, in _write
return cs.to_tree()
File "C:\Python3\lib\site-packages\openpyxl\chart\chartspace.py", line 193, in to_tree
tree = super(ChartSpace, self).to_tree()
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\chart\plotarea.py", line 135, in to_tree
return super(PlotArea, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 105, in to_tree
el = v.to_tree(namespace=namespace)
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 107, in to_tree
return super(ChartBase, self).to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 39, in to_tree
el = v.to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\chart\series.py", line 170, in to_tree
return super(Series, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
AttributeError: 'str' object has no attribute 'to_tree'
这是我用来执行“另存为”过程的代码,所以不要介意添加数据等等,保存操作会损坏文件:
from openpyxl import load_workbook
wb=load_workbook("Production Monitoring Script.xlsx")
ws=wb['Prod Perf Script Data']
wb.save("new.xlsx")
我尝试了一个使用 Powershell 的替代解决方案,它奏效了。
$xl=New-Object -ComObject Excel.Application
$wb=$xl.WorkBooks.Open('<path here>\Production Monitoring Script.xlsx')
$ws=$wb.WorkSheets.item(1)
$xl.Visible=$true
$ws.Cells.Item(7, 618)=50
$wb.SaveAs('<path here>\New.xlsx')
$xl.Quit()
能够在该单元格中保存值“50”。
正如在对原始 post 的评论中所讨论的那样:openpyxl 不支持某些图形和其他项目,即使它们位于未被您的代码修改的工作表中。这不是一个完整的解决方法,但当不受支持的objects仅在其他工作表中时有效。
我制作了一个示例 .xlsx 工作簿,其中包含两个工作表,'TWC' 和 'UV240 Results'。此代码假定任何标题以 'Results' 结尾的工作表包含不受支持的图像,并创建两个临时文件 - imageoutput 包含不受支持的图像,outputtemp 包含可以修改而不会被 openpyxl 损坏的工作表。然后在最后将它们缝合在一起。
可能部分效率低下;请编辑或评论改进!
import os
import shutil
import win32com.client
from openpyxl import load_workbook
name = 'spreadsheet.xlsx'
outputfile = 'output.xlsx'
outputtemp = 'outputtemp.xlsx'
shutil.copyfile(name, 'output.xlsx')
wb = load_workbook('output.xlsx')
ws = wb['TWC']
# TWC doesn't have images. Anything ending with 'Results' has unsupported images etc
# Create new file with only openpyxl-unsupported worksheets
imageworksheets = [ws if ws.title.endswith('Results') else '' for ws in wb.worksheets]
if [ws for ws in wb if ws.title != 'TWC']:
imageoutput = 'output2.xlsx'
imagefilewritten = False
while not imagefilewritten:
try:
shutil.copy(name, imageoutput)
except PermissionError as error:
# Catch an exception here - I usually have a GUI function
pass
else:
imagefilewritten = True
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = False
imagewb = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
excel.DisplayAlerts = False
for i, ws in enumerate(imageworksheets[::-1]): # Go backwards to avoid reindexing
if not ws:
wsindex = len(imageworksheets) - i
imagewb.Worksheets(wsindex).Delete()
imagefileupdated = False
while not imagefileupdated:
try:
imagewb.Save()
imagewb.Close(SaveChanges = True)
print('Temp image workbook saved.')
except PermissionError as error:
# Catch exception
pass
else:
imagefileupdated = True
# Remove the unsupported worksheets in openpyxl
for ws in wb.worksheets:
if ws in imageworksheets:
wb.remove(ws)
wb.save(outputtemp)
print('Temp output workbook saved.')
''' Do your desired openpyxl manipulations on the remaining supported worksheet '''
# Merge the outputtemp and imageoutput into outputfile
wb1 = excel.Workbooks.Open(os.path.join(os.getcwd(), outputtemp))
wb2 = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
for ws in wb1.Sheets:
ws.Copy(wb2.Sheets(1))
wb2.SaveAs(os.path.join(os.getcwd(), outputfile))
wb1.Close(SaveChanges = True)
wb2.Close(SaveChanges = True)
print(f'Output workbook saved as {outputfile}.')
excel.Visible = True
excel.DisplayAlerts = True