需要重新格式化文本文件,将演讲者文本向上移动一行到演讲者标签
Need to reformat text file moving speaker text up a line to speaker label
我有一些 .txt 文件需要重新格式化文本。具体来说,我有 Speaker A 和 Speaker B,正文在下一行。
A:
I can not believe the weather today .
B:
It is beautiful outside .
A:
Really nice .
B:
Okay , how are you doing ?
A:
I am good .
B:
Good to hear .
A:
Thank you .
可以有更多的演讲者,但所有人都会在他们的标签前加上 :。
我希望文件输出为:
A: I can not believe the weather today .
B: It is beautiful outside .
A: Really nice .
B: Okay , how are you doing ?
A: I am good .
B: Good to hear .
A: Thank you .
谢谢。
编辑:
另外,如果speaker labels之间有多行文字,有解决办法吗?例如:
A:
Well hello .
Long time no see .
How are you doing ?
B:
Good .
How are you ?
A:
Really great .
B:
Good .
有了预期的结果...
A: Well hello . Long time no see . How are you doing ?
B: Good . How are you ?
A: Really great .
B: Good .
正则表达式替换可以解决这个问题:
import re
text = """A:
I can not believe the weather today .
B:
It is beautiful outside ."""
text = re.sub(r"^(\w+:)\s*", r" ", text, flags=re.MULTILINE)
print(text)
# A: I can not believe the weather today .
# B: It is beautiful outside .
编辑:
基于更新后的问题,多线对话:
import re
text = """A:
Well hello .
Long time no see .
How are you doing ?
B:
Good .
How are you ?"""
text = re.sub(r"(.*?)\s*\n(?!\w+:)", r" ", text, flags=re.MULTILINE)
print(text)
# A: Well hello . Long time no see . How are you doing ?
# B: Good . How are you ?
如果短语在一行上,这应该有效:
lines = file.readlines()
for ii in range(1,len(lines),2):
print(lines[ii-1][:-1]+lines[ii])
我有一些 .txt 文件需要重新格式化文本。具体来说,我有 Speaker A 和 Speaker B,正文在下一行。
A:
I can not believe the weather today .
B:
It is beautiful outside .
A:
Really nice .
B:
Okay , how are you doing ?
A:
I am good .
B:
Good to hear .
A:
Thank you .
可以有更多的演讲者,但所有人都会在他们的标签前加上 :。
我希望文件输出为:
A: I can not believe the weather today .
B: It is beautiful outside .
A: Really nice .
B: Okay , how are you doing ?
A: I am good .
B: Good to hear .
A: Thank you .
谢谢。
编辑:
另外,如果speaker labels之间有多行文字,有解决办法吗?例如:
A:
Well hello .
Long time no see .
How are you doing ?
B:
Good .
How are you ?
A:
Really great .
B:
Good .
有了预期的结果...
A: Well hello . Long time no see . How are you doing ?
B: Good . How are you ?
A: Really great .
B: Good .
正则表达式替换可以解决这个问题:
import re
text = """A:
I can not believe the weather today .
B:
It is beautiful outside ."""
text = re.sub(r"^(\w+:)\s*", r" ", text, flags=re.MULTILINE)
print(text)
# A: I can not believe the weather today .
# B: It is beautiful outside .
编辑:
基于更新后的问题,多线对话:
import re
text = """A:
Well hello .
Long time no see .
How are you doing ?
B:
Good .
How are you ?"""
text = re.sub(r"(.*?)\s*\n(?!\w+:)", r" ", text, flags=re.MULTILINE)
print(text)
# A: Well hello . Long time no see . How are you doing ?
# B: Good . How are you ?
如果短语在一行上,这应该有效:
lines = file.readlines()
for ii in range(1,len(lines),2):
print(lines[ii-1][:-1]+lines[ii])