python-docx 的粗体、下划线和迭代
Bold, underlining, and Iterations with python-docx
我正在编写一个程序来从 ASCII 文件中获取数据并将数据放在 Word 文档中的适当位置,并且只将特定的单词加粗并加下划线。我是 Python 的新手,但我在 Matlab 编程方面有丰富的经验。我的代码是:
#IMPORT ASCII DATA AND MAKE IT USEABLE
#Alternatively Pandas - gives better table display results
import pandas as pd
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",",
header=None)
#print data
#data[1][3] gives value at particular data points within matrix
i=len(data[1])
print 'Number of Points imported =', i
#IMPORT WORD DOCUMENT
import docx #Opens Python Word document tool
from docx import Document #Invokes Document command from docx
document = Document('test_iteration.docx') #Imports Word Document to Modify
t = len(document.paragraphs) #gives the number of lines in document
print 'Total Number of lines =', t
#for paragraph in document.paragraphs:
# print(para.text) #Prints the text in the entire document
font = document.styles['Normal'].font
font.name = 'Arial'
from docx.shared import Pt
font.size = Pt(8)
#font.bold = True
#font.underline = True
for paragraph in document.paragraphs:
if 'NORTHING:' in paragraph.text:
#print paragraph.text
paragraph.text = 'NORTHING: \t', str(data[1][0])
print paragraph.text
elif 'EASTING:' in paragraph.text:
#print paragraph.text
paragraph.text = 'EASTING: \t', str(data[2][0])
print paragraph.text
elif 'ELEV:' in paragraph.text:
#print paragraph.text
paragraph.text = 'ELEV: \t', str(data[3][0])
print paragraph.text
elif 'CSF:' in paragraph.text:
#print paragraph.text
paragraph.text = 'CSF: \t', str(data[8][0])
print paragraph.text
elif 'STD. DEV.:' in paragraph.text:
#print paragraph.text
paragraph.text = 'STD. DEV.: ', 'N: ', str(data[5][0]), '\t E: ',
str(data[6][0]), '\t EL: ', str(data[7][0])
print paragraph.text
#for paragraph in document.paragraphs:
#print(paragraph.text) #Prints the text in the entire document
#document.save('test1_save.docx') #Saves as Word Document after Modification
我的问题是如何只使 "NORTHING:" 加粗并加下划线:
paragraph.text = 'NORTHING: \t', str(data[1][0])
print paragraph.text
所以我写了一个伪 "find and replace" 命令,如果所有被替换的值都完全相同,它就可以很好地工作。但是,我需要将第二段中的值替换为 ASCII 文件第二个数组中的值,将第三段中的值替换为第三个数组中的值……等等。 (我必须使用查找和替换,因为文档的格式对我来说是高级的,以便我在程序中复制,除非有一个程序可以读取 Word 文件并将编程写回 Python 脚本.. .对其进行逆向工程。)
我还在学习,所以代码对你来说可能看起来很粗糙。我只是想把这个无聊的复制和粘贴过程自动化。
未经测试,但假设 python-docx 类似于 python-pptx(应该是,它由同一开发人员维护,粗略查看文档表明它的方式与 PPT/DOC 文件的接口相同,使用相同的方法等)
为了操作段落或单词的子字符串,您需要使用 run
对象:
https://python-docx.readthedocs.io/en/latest/api/text.html#run-objects
实际上,这看起来像:
for paragraph in document.paragraphs:
if 'NORTHING:' in paragraph.text:
paragraph.clear()
run = paragraph.add_run()
run.text = 'NORTHING: \t'
run.font.bold = True
run.font.underline = True
run = paragraph.add_run()
run.text = str(data[1][0])
从概念上讲,您为需要操作的 paragraph/text 的每个 部分 创建一个 run
实例。所以,首先我们用粗体创建一个 run
,然后我们添加另一个 运行(我认为它不会是 bold/underline,但如果只是将它们设置为 False
).
注意:最好将所有 import
语句放在模块的顶部。
这可以通过使用像字典这样的映射对象进行一些优化,您可以使用它来将匹配值 ("NORTHING") 关联为 keys
,并将段落文本的其余部分关联为values
。 还未测试
import pandas as pd
from docx import Document
from docx.shared import Pt
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",",
header=None)
i=len(data[1])
print 'Number of Points imported =', i
document = Document('test_iteration.docx') #Imports Word Document to Modify
t = len(document.paragraphs) #gives the number of lines in document
print 'Total Number of lines =', t
font = document.styles['Normal'].font
font.name = 'Arial'
font.size = Pt(8)
# This maps the matching strings to the data array values
data_dict = {
'NORTHING:': data[1][0],
'EASTING:': data[2][0],
'ELEV:': data[3][0],
'CSF:': data[8][0],
'STD. DEV.:': 'N: {0}\t E: {1}\t EL: {2}'.format(data[5][0], data[6][0], data[7][0])
}
for paragraph in document.paragraphs:
for k,v in data_dict.items():
if k in paragraph.text:
paragraph.clear()
run = paragraph.add_run()
run.text = k + '\t'
run.font.bold = True
run.font.underline = True
run = paragraph.add_run()
run.text = '{0}'.format(v)
我正在编写一个程序来从 ASCII 文件中获取数据并将数据放在 Word 文档中的适当位置,并且只将特定的单词加粗并加下划线。我是 Python 的新手,但我在 Matlab 编程方面有丰富的经验。我的代码是:
#IMPORT ASCII DATA AND MAKE IT USEABLE
#Alternatively Pandas - gives better table display results
import pandas as pd
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",",
header=None)
#print data
#data[1][3] gives value at particular data points within matrix
i=len(data[1])
print 'Number of Points imported =', i
#IMPORT WORD DOCUMENT
import docx #Opens Python Word document tool
from docx import Document #Invokes Document command from docx
document = Document('test_iteration.docx') #Imports Word Document to Modify
t = len(document.paragraphs) #gives the number of lines in document
print 'Total Number of lines =', t
#for paragraph in document.paragraphs:
# print(para.text) #Prints the text in the entire document
font = document.styles['Normal'].font
font.name = 'Arial'
from docx.shared import Pt
font.size = Pt(8)
#font.bold = True
#font.underline = True
for paragraph in document.paragraphs:
if 'NORTHING:' in paragraph.text:
#print paragraph.text
paragraph.text = 'NORTHING: \t', str(data[1][0])
print paragraph.text
elif 'EASTING:' in paragraph.text:
#print paragraph.text
paragraph.text = 'EASTING: \t', str(data[2][0])
print paragraph.text
elif 'ELEV:' in paragraph.text:
#print paragraph.text
paragraph.text = 'ELEV: \t', str(data[3][0])
print paragraph.text
elif 'CSF:' in paragraph.text:
#print paragraph.text
paragraph.text = 'CSF: \t', str(data[8][0])
print paragraph.text
elif 'STD. DEV.:' in paragraph.text:
#print paragraph.text
paragraph.text = 'STD. DEV.: ', 'N: ', str(data[5][0]), '\t E: ',
str(data[6][0]), '\t EL: ', str(data[7][0])
print paragraph.text
#for paragraph in document.paragraphs:
#print(paragraph.text) #Prints the text in the entire document
#document.save('test1_save.docx') #Saves as Word Document after Modification
我的问题是如何只使 "NORTHING:" 加粗并加下划线:
paragraph.text = 'NORTHING: \t', str(data[1][0])
print paragraph.text
所以我写了一个伪 "find and replace" 命令,如果所有被替换的值都完全相同,它就可以很好地工作。但是,我需要将第二段中的值替换为 ASCII 文件第二个数组中的值,将第三段中的值替换为第三个数组中的值……等等。 (我必须使用查找和替换,因为文档的格式对我来说是高级的,以便我在程序中复制,除非有一个程序可以读取 Word 文件并将编程写回 Python 脚本.. .对其进行逆向工程。)
我还在学习,所以代码对你来说可能看起来很粗糙。我只是想把这个无聊的复制和粘贴过程自动化。
未经测试,但假设 python-docx 类似于 python-pptx(应该是,它由同一开发人员维护,粗略查看文档表明它的方式与 PPT/DOC 文件的接口相同,使用相同的方法等)
为了操作段落或单词的子字符串,您需要使用 run
对象:
https://python-docx.readthedocs.io/en/latest/api/text.html#run-objects
实际上,这看起来像:
for paragraph in document.paragraphs:
if 'NORTHING:' in paragraph.text:
paragraph.clear()
run = paragraph.add_run()
run.text = 'NORTHING: \t'
run.font.bold = True
run.font.underline = True
run = paragraph.add_run()
run.text = str(data[1][0])
从概念上讲,您为需要操作的 paragraph/text 的每个 部分 创建一个 run
实例。所以,首先我们用粗体创建一个 run
,然后我们添加另一个 运行(我认为它不会是 bold/underline,但如果只是将它们设置为 False
).
注意:最好将所有 import
语句放在模块的顶部。
这可以通过使用像字典这样的映射对象进行一些优化,您可以使用它来将匹配值 ("NORTHING") 关联为 keys
,并将段落文本的其余部分关联为values
。 还未测试
import pandas as pd
from docx import Document
from docx.shared import Pt
data = pd.read_csv('203792_M-51_Niles_control_SD_ACSF.txt', sep=",",
header=None)
i=len(data[1])
print 'Number of Points imported =', i
document = Document('test_iteration.docx') #Imports Word Document to Modify
t = len(document.paragraphs) #gives the number of lines in document
print 'Total Number of lines =', t
font = document.styles['Normal'].font
font.name = 'Arial'
font.size = Pt(8)
# This maps the matching strings to the data array values
data_dict = {
'NORTHING:': data[1][0],
'EASTING:': data[2][0],
'ELEV:': data[3][0],
'CSF:': data[8][0],
'STD. DEV.:': 'N: {0}\t E: {1}\t EL: {2}'.format(data[5][0], data[6][0], data[7][0])
}
for paragraph in document.paragraphs:
for k,v in data_dict.items():
if k in paragraph.text:
paragraph.clear()
run = paragraph.add_run()
run.text = k + '\t'
run.font.bold = True
run.font.underline = True
run = paragraph.add_run()
run.text = '{0}'.format(v)