如何提取在 docx 文档中创建的 table 中的文本数据
How to extract text data in a table created in a docx document
我想从 docx 文档中提取文本,我想出了一个从 docx 文档中提取文本的脚本,但我注意到有些文档有 table 而脚本对它们不起作用,我怎么能改进上面的脚本:
import glob
import os
import docx
with open('your_file.txt', 'w') as f:
for directory in glob.glob('fi*'):
for filename in glob.glob(os.path.join(directory, "*")):
if filename.endswith((".docx", ".doc")):
document = docx.Document(filename)
for paragraph in document.paragraphs:
if paragraph.text:
#docText.append(paragraph.text)
f.write("%s\n" % paragraph.text)
docx 与 table
尝试使用 python-docx 模块
pip install python-docx
import docx
doc = docx.Document("document.docx")
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
我想从 docx 文档中提取文本,我想出了一个从 docx 文档中提取文本的脚本,但我注意到有些文档有 table 而脚本对它们不起作用,我怎么能改进上面的脚本:
import glob
import os
import docx
with open('your_file.txt', 'w') as f:
for directory in glob.glob('fi*'):
for filename in glob.glob(os.path.join(directory, "*")):
if filename.endswith((".docx", ".doc")):
document = docx.Document(filename)
for paragraph in document.paragraphs:
if paragraph.text:
#docText.append(paragraph.text)
f.write("%s\n" % paragraph.text)
docx 与 table
尝试使用 python-docx 模块
pip install python-docx
import docx
doc = docx.Document("document.docx")
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)