从文本文件中分离出所有段落,并使用 python 保存每个分离段落的单独文本文件
Separating all passages from a text file and saving individual text file of each separated passage using python
问题总结:我有一个包含100个段落的文本文件。我需要将这 100 篇文章全部分离出来,并分别保存在 100 个文本文件中。
输入文本文件中的段落模式:
25763772|t|DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis
25763772|a|Pseudomonas aeruginosa (Pa) infection in cystic fibrosis (CF) patients is present
25763772 0 5 DCTN4 T116,T123 C4308010
25763772 23 63 chronic Pseudomonas aeruginosa infection T047 C0854135
25763772 67 82 cystic fibrosis T047 C0010674
25847295|t|Nonylphenol diethoxylate inhibits apoptosis induced in PC12 cells
25847295|a|Nonylphenol and short-chain nonylphenol ethoxylates such as NP2 EO are digested
25847295 0 24 Nonylphenol diethoxylate T131 C1254354
25847295 25 33 inhibits T052 C3463820
同样,在该单个文本文件中存在 100 个长度可变的段落。
我正在尝试这样的代码,它没有显示任何错误,但甚至无法单独提取和保存单个段落。请就此提出任何帮助或解决方案。提前致谢。
代码:
with open('corpus_pubtator1.txt', 'r') as contents, open('tested23.txt', 'w') as file:
contents = contents.read()
lines = contents.split('\n')
for index, line in enumerate(lines):
if index != len(lines) - 1:
file.write(line + '.\n')
else:
pass
试试这个:
lines = []
with open("corpus_pubtator1.txt", "r") as rf:
lines = rf.readlines()
lines = [i if i else i.strip() for i in lines]
passages = []
passage_cache = []
for i, line in enumerate(lines):
if i == len(lines) - 1:
passages.append(passage_cache)
if line.strip():
passage_cache.append(line)
else:
passages.append(passage_cache)
passage_cache = [line]
for i, passage in enumerate(passages):
with open(f"tested{i}.txt", 'w') as wf:
for line in passage:
wf.write(line)
它会打开第一个输入文件,读取所有行并区分段落之间的空行,并且它会为每个段落创建一个单独的文本文件并在其中写入行。
问题总结:我有一个包含100个段落的文本文件。我需要将这 100 篇文章全部分离出来,并分别保存在 100 个文本文件中。
输入文本文件中的段落模式:
25763772|t|DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis
25763772|a|Pseudomonas aeruginosa (Pa) infection in cystic fibrosis (CF) patients is present
25763772 0 5 DCTN4 T116,T123 C4308010
25763772 23 63 chronic Pseudomonas aeruginosa infection T047 C0854135
25763772 67 82 cystic fibrosis T047 C0010674
25847295|t|Nonylphenol diethoxylate inhibits apoptosis induced in PC12 cells
25847295|a|Nonylphenol and short-chain nonylphenol ethoxylates such as NP2 EO are digested
25847295 0 24 Nonylphenol diethoxylate T131 C1254354
25847295 25 33 inhibits T052 C3463820
同样,在该单个文本文件中存在 100 个长度可变的段落。
我正在尝试这样的代码,它没有显示任何错误,但甚至无法单独提取和保存单个段落。请就此提出任何帮助或解决方案。提前致谢。
代码:
with open('corpus_pubtator1.txt', 'r') as contents, open('tested23.txt', 'w') as file:
contents = contents.read()
lines = contents.split('\n')
for index, line in enumerate(lines):
if index != len(lines) - 1:
file.write(line + '.\n')
else:
pass
试试这个:
lines = []
with open("corpus_pubtator1.txt", "r") as rf:
lines = rf.readlines()
lines = [i if i else i.strip() for i in lines]
passages = []
passage_cache = []
for i, line in enumerate(lines):
if i == len(lines) - 1:
passages.append(passage_cache)
if line.strip():
passage_cache.append(line)
else:
passages.append(passage_cache)
passage_cache = [line]
for i, passage in enumerate(passages):
with open(f"tested{i}.txt", 'w') as wf:
for line in passage:
wf.write(line)
它会打开第一个输入文件,读取所有行并区分段落之间的空行,并且它会为每个段落创建一个单独的文本文件并在其中写入行。