如何使用 python 从特定关键字中提取有限的数据行
How to extract limited lines of data from specific keyword using python
我有一个文本文件,我需要提取段落中出现指定关键字的前五行。
我能够找到关键字,但无法根据该关键字写下五行。
mylines = []
with open ('D:\Tasks\Task_20\txt\CV (4).txt', 'rt') as myfile:
for line in myfile:
mylines.append(line)
for element in mylines:
print(element, end='')
print(mylines[0].find("P"))
如果有人对此有任何想法,请提供帮助。
输入文本文件示例:-
菲律宾合作机构:ALL POWER STAFFING SOLUTIONS, INC.
培训 Objectives: : 拥有国际文化接触和该领域的实践经验
酒店管理作为通往有意义的酒店职业的门户。培养我的待客之道
管理技能并具有全球竞争力。
教育
机构名称: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES
Location Hom as Pinas City, Philippine Institution 开始日期:(2007 年 6 月
要求输出:-
培训 Objectives: : 拥有国际文化接触和该领域的实践经验
酒店管理作为通往有意义的酒店职业的门户。培养我的待客之道
管理技能并具有全球竞争力。
#
我必须在文本文件中搜索 Training Objective 关键字,它发现应该只写下 5 行。
试试这个:
with open('test.txt') as f:
content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
for lines in content[num:num+5]:
print (lines)
如果你只有几个字(只是为了获取索引):
index = []
for i, line in enumerate(content):
if 'hello' in line or 'there' in line: //add your or + word here
index.append(i)
print(index)
如果你有很多(只是为了获取索引):
list = ["hello","there","blink"] //insert your words here
index = []
for i, line in enumerate(content):
for items in list:
if items in line:
index.append(i)
print(index)
这取决于你\n 的位置,但我将一个正则表达式放在一起可能有助于我的文本在变量 st 中的外观示例:
In [254]: st
Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'
impore re
re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)
Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']
如果您只是想提取整个 "Training Objectives" 块,请查找关键字并继续追加行,直到遇到空行(或其他合适的标记,下一个 header例如)。
(已编辑以处理多个文件和关键字)
def extract_block(filename, keywords):
mylines = []
with open(filename) as myfile:
save_flag = False
for line in myfile:
if any(line.startswith(kw) for kw in keywords):
save_flag = True
elif line.strip() == '':
save_flag = False
if save_flag:
mylines.append(line)
return mylines
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
block = extract_block(filename, keywords)
这假设每个文件中只有 1 个块。如果您从每个文件中提取多个块,它会变得更加复杂。
如果你真的想要 5 行,始终和每次,那么你可以做类似的事情,但添加一个计数器来计算你的 5 行。
我有一个文本文件,我需要提取段落中出现指定关键字的前五行。
我能够找到关键字,但无法根据该关键字写下五行。
mylines = []
with open ('D:\Tasks\Task_20\txt\CV (4).txt', 'rt') as myfile:
for line in myfile:
mylines.append(line)
for element in mylines:
print(element, end='')
print(mylines[0].find("P"))
如果有人对此有任何想法,请提供帮助。
输入文本文件示例:-
菲律宾合作机构:ALL POWER STAFFING SOLUTIONS, INC.
培训 Objectives: : 拥有国际文化接触和该领域的实践经验 酒店管理作为通往有意义的酒店职业的门户。培养我的待客之道 管理技能并具有全球竞争力。
教育 机构名称: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution 开始日期:(2007 年 6 月
要求输出:-
培训 Objectives: : 拥有国际文化接触和该领域的实践经验 酒店管理作为通往有意义的酒店职业的门户。培养我的待客之道 管理技能并具有全球竞争力。
#我必须在文本文件中搜索 Training Objective 关键字,它发现应该只写下 5 行。
试试这个:
with open('test.txt') as f:
content = f.readlines()
index = [x for x in range(len(content)) if 'training objectives' in content[x].lower()]
for num in index:
for lines in content[num:num+5]:
print (lines)
如果你只有几个字(只是为了获取索引):
index = []
for i, line in enumerate(content):
if 'hello' in line or 'there' in line: //add your or + word here
index.append(i)
print(index)
如果你有很多(只是为了获取索引):
list = ["hello","there","blink"] //insert your words here
index = []
for i, line in enumerate(content):
for items in list:
if items in line:
index.append(i)
print(index)
这取决于你\n 的位置,但我将一个正则表达式放在一起可能有助于我的文本在变量 st 中的外观示例:
In [254]: st
Out[254]: 'Philippine Partner Agency: ALL POWER STAFFING SOLUTIONS, INC.\n\nTraining Objectives::\nTo have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\nEducation Institution Name: SOUTHVILLE FOREIGN UNIVERSITY - PHILIPPINES Location Hom as Pinas City, Philippine Institution start date: (June 2007\n'
impore re
re.findall('Training Objectives:.*\n((?:.*\n){1,5})', st)
Out[255]: ['To have international cultural exposure and hands-on experience \nin the field of hospitality management as a gateway to a meaningful hospitality career. \nTo develop my hospitality management skills and become globally competitive.\n\n\n']
如果您只是想提取整个 "Training Objectives" 块,请查找关键字并继续追加行,直到遇到空行(或其他合适的标记,下一个 header例如)。
(已编辑以处理多个文件和关键字)
def extract_block(filename, keywords):
mylines = []
with open(filename) as myfile:
save_flag = False
for line in myfile:
if any(line.startswith(kw) for kw in keywords):
save_flag = True
elif line.strip() == '':
save_flag = False
if save_flag:
mylines.append(line)
return mylines
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
keywords = ['keyword1', 'keyword2', 'keyword3']
for filename in filenames:
block = extract_block(filename, keywords)
这假设每个文件中只有 1 个块。如果您从每个文件中提取多个块,它会变得更加复杂。
如果你真的想要 5 行,始终和每次,那么你可以做类似的事情,但添加一个计数器来计算你的 5 行。