在多行中提取连字符分隔的键值对
extracting hyphen separated key-value pair with value in multiple lines
输入文本文件
LID - E164 [pii]
LID - 10.3390/antiox9020164 [doi]
AB - Although prickly pear fruits have become an important part of the Canary diet,
their native varieties are yet to be characterized in terms of betalains and
phenolic compounds.
FAU - Gomez-Maqueo, Andrea
AU - Gomez-Maqueo A
AUID- ORCID: 0000-0002-0579-1855
PG - 1-13
LID - 10.1007/s00442-020-04624-w [doi]
AB - Recent observational evidence suggests that nighttime temperatures are increasing
faster than daytime temperatures, while in some regions precipitation events are
becoming less frequent and more intense.
CI - (c) 2020 Production and hosting by Elsevier B.V. on behalf of Cairo University.
FAU - Farag, Mohamed A
AU - Farag MA
PG - 3044
LID - 10.3389/fmicb.2019.03044 [doi]
AB - Microbial symbionts account for survival, development, fitness and evolution of
eukaryotic hosts. These microorganisms together with their host form a biological
unit known as holobiont.
AU - Flores-Nunez VM
AD - Departamento de Ingenieria Genetica, Centro de Investigacion y de Estudios
Avanzados del Instituto Politecnico Nacional, Irapuato, Mexico.
我正在尝试提取文中 AB
表示的摘要。我遍历每一行,检查关键是否是摘要的关键。如果是这样,我将设置一个标志并附加由 space 分隔的后续行。有更好的方法吗?
f = "sample.txt"
abstracts = []
flag = False
with open(f) as myfile:
for line in myfile:
# append subsequent lines if flag is set
if flag:
if line.startswith(" "):
req_line = req_line + " " + line.strip()
else:
abstracts.append(req_line)
req_line = ""
flag = False
# find beginning of abstract
if line.startswith("AB - "):
req_line = line.replace("AB - ", "", 1)
flag = True
输出:
[
"Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.",
"Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.",
"Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont."
]
使用正则表达式(假设您的输入字符串是 s
通过 open("file.txt").read()
读取):
import re
matches = re.findall("AB\W*-\W*([^-]*(?=\n))", s)
output = [" ".join(map(str.strip, i.split("\n"))) for i in matches]
给予
['Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.',
'Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.',
'Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont.']
输入文本文件
LID - E164 [pii]
LID - 10.3390/antiox9020164 [doi]
AB - Although prickly pear fruits have become an important part of the Canary diet,
their native varieties are yet to be characterized in terms of betalains and
phenolic compounds.
FAU - Gomez-Maqueo, Andrea
AU - Gomez-Maqueo A
AUID- ORCID: 0000-0002-0579-1855
PG - 1-13
LID - 10.1007/s00442-020-04624-w [doi]
AB - Recent observational evidence suggests that nighttime temperatures are increasing
faster than daytime temperatures, while in some regions precipitation events are
becoming less frequent and more intense.
CI - (c) 2020 Production and hosting by Elsevier B.V. on behalf of Cairo University.
FAU - Farag, Mohamed A
AU - Farag MA
PG - 3044
LID - 10.3389/fmicb.2019.03044 [doi]
AB - Microbial symbionts account for survival, development, fitness and evolution of
eukaryotic hosts. These microorganisms together with their host form a biological
unit known as holobiont.
AU - Flores-Nunez VM
AD - Departamento de Ingenieria Genetica, Centro de Investigacion y de Estudios
Avanzados del Instituto Politecnico Nacional, Irapuato, Mexico.
我正在尝试提取文中 AB
表示的摘要。我遍历每一行,检查关键是否是摘要的关键。如果是这样,我将设置一个标志并附加由 space 分隔的后续行。有更好的方法吗?
f = "sample.txt"
abstracts = []
flag = False
with open(f) as myfile:
for line in myfile:
# append subsequent lines if flag is set
if flag:
if line.startswith(" "):
req_line = req_line + " " + line.strip()
else:
abstracts.append(req_line)
req_line = ""
flag = False
# find beginning of abstract
if line.startswith("AB - "):
req_line = line.replace("AB - ", "", 1)
flag = True
输出:
[
"Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.",
"Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.",
"Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont."
]
使用正则表达式(假设您的输入字符串是 s
通过 open("file.txt").read()
读取):
import re
matches = re.findall("AB\W*-\W*([^-]*(?=\n))", s)
output = [" ".join(map(str.strip, i.split("\n"))) for i in matches]
给予
['Although prickly pear fruits have become an important part of the Canary diet, their native varieties are yet to be characterized in terms of betalains and phenolic compounds.',
'Recent observational evidence suggests that nighttime temperatures are increasing faster than daytime temperatures, while in some regions precipitation events are becoming less frequent and more intense.',
'Microbial symbionts account for survival, development, fitness and evolution of eukaryotic hosts. These microorganisms together with their host form a biological unit known as holobiont.']