Python 3 - 如何删除 line/paragraph 中断
Python 3 - How to remove line/paragraph breaks
from docx import Document
alphaDic = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','!','?','.','~',',','(',')','$','-',':',';',"'",'/']
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
removeSpaces = " ".join(translation.split())
removeLineBreaks = removeSpaces.replace('\n','')
doc.paragraphs[docIndex].text = removeLineBreaks
docIndex +=1
我正在尝试从文档中删除换行符,但它不起作用。
我还在收到
Hello
There
而不是
Hello
There
由于 readlines 可以读取任何类型的文本文件,您可以打开文件重写您想要的行并忽略您不想使用的行。
"""example"""
file = open("file name", "w")
for line in file.readlines():
if (line != ''):
file.write(line)
该软件包附带一个提取文本的 example program。
也就是说,我认为您的问题源于您试图对段落进行操作。但是段落之间的分隔是换行符发生的地方。所以即使你用空字符串(''
)替换一个程序,它的末尾仍然会添加一个换行符。
您应该采用示例程序的方法,并进行自己的格式化,或者您应该确保删除可能位于 "full" 段落之间的任何虚假 "empty" 段落有 ("Hello", "", "There") -> ("Hello", "There").
我想你想要做的是去掉一个空段落。以下函数可能会有所帮助,它会删除您不想要的某个段落:
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None
Code by: Scanny*
在您的代码中,您可以检查翻译是否等于 ''
,如果是,则调用 delete_paragraph
函数,因此您的代码如下:
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
if translation != '':
doc.paragraphs[docIndex].text = translation
else:
delete_paragraph(doc.paragraphs[docIndex])
docIndex -=1 # go one step back in the loop because of the deleted index
docIndex +=1
from docx import Document
alphaDic = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','!','?','.','~',',','(',')','$','-',':',';',"'",'/']
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
removeSpaces = " ".join(translation.split())
removeLineBreaks = removeSpaces.replace('\n','')
doc.paragraphs[docIndex].text = removeLineBreaks
docIndex +=1
我正在尝试从文档中删除换行符,但它不起作用。 我还在收到
Hello
There
而不是
Hello
There
由于 readlines 可以读取任何类型的文本文件,您可以打开文件重写您想要的行并忽略您不想使用的行。
"""example"""
file = open("file name", "w")
for line in file.readlines():
if (line != ''):
file.write(line)
该软件包附带一个提取文本的 example program。
也就是说,我认为您的问题源于您试图对段落进行操作。但是段落之间的分隔是换行符发生的地方。所以即使你用空字符串(''
)替换一个程序,它的末尾仍然会添加一个换行符。
您应该采用示例程序的方法,并进行自己的格式化,或者您应该确保删除可能位于 "full" 段落之间的任何虚假 "empty" 段落有 ("Hello", "", "There") -> ("Hello", "There").
我想你想要做的是去掉一个空段落。以下函数可能会有所帮助,它会删除您不想要的某个段落:
def delete_paragraph(paragraph): p = paragraph._element p.getparent().remove(p) p._p = p._element = None
Code by: Scanny*
在您的代码中,您可以检查翻译是否等于 ''
,如果是,则调用 delete_paragraph
函数,因此您的代码如下:
while docIndex < len(doc.paragraphs):
firstSen = doc.paragraphs[docIndex].text
rep_dic = {ord(k):None for k in alphaDic + [x.upper() for x in alphaDic]}
translation = (firstSen.translate(rep_dic))
if translation != '':
doc.paragraphs[docIndex].text = translation
else:
delete_paragraph(doc.paragraphs[docIndex])
docIndex -=1 # go one step back in the loop because of the deleted index
docIndex +=1