如何在一行中找到一个子字符串并从该行追加到下一个子字符串?
How to find a substring in a line and append from that line up to the next substring?
test.txt 会是
1
2
3
start
4
5
6
end
7
8
9
我希望结果是
start
4
5
6
end
这是我的代码
file = open('test.txt','r')
line = file.readline()
start_keyword = 'start'
end_keyword = 'end'
lines = []
while line:
line = file.readlines()
for words_in_line in line:
if start_keyword in words_in_line:
lines.append(words_in_line)
file.close()
print entities
它returns
['start\n']
我不知道要在上面的代码中添加什么才能达到我想要的结果。我一直在搜索和更改代码,但我不知道如何让它按我想要的方式工作。
您可以使用某种标志,当您遇到 start_keyword
时设置为真,如果设置了该标志,您将这些行添加到 lines
列表中,并且当您遇到 lines
时它会被取消设置遇到 end_keyword
(但仅在 end_keyword 已写入 lines
列表之后。
同时在 words_in_line
上使用 .strip()
删除 \n
(以及其他尾随和前导空格)如果您不希望它们出现在列表 lines
中,如果你确实想要它们,那就不要剥离它。
例子-
flag = False
for words_in_line in line:
if start_keyword in words_in_line:
flag = True
if flag:
lines.append(words_in_line.strip())
if end_keyword in words_in_line:
flag = False
请注意,这会将多个 start
到 end
块添加到 lines
列表中,我猜这就是您想要的。
使用旗帜。试试这个:
file = open('test.txt','r')
start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []
lines = file.readlines()
for line in lines:
line = line.strip()
if line == start_keyword:
in_range = True
elif line == end_keyword:
in_range = False
elif in_range:
entities.append(line)
file.close()
# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]
print entities
关于您的代码,请注意 readlines already reads all lines in a file, so calling readline doesn't seem to make much sense, unless you are ignoring the first line. Also use strip 从字符串中删除 EOL 字符。请注意您的代码如何没有按照您的预期执行:
# Reads ALL lines in the file as an array
line = file.readlines()
# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:
# If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
if start_keyword in words_in_line:
lines.append(words_in_line)
您需要一个状态变量来决定您是否存储这些行。这是一个简单的示例,它将始终存储该行,然后在您不想要的情况下改变主意并丢弃它:
start_keyword = 'start'
end_keyword = 'end'
lines = []
reading = False
with open('test.txt', 'r') as f:
for line in f:
lines.append(line)
if start_keyword in line:
reading = True
elif end_keyword in line:
reading = False
elif not reading:
lines.pop()
print ''.join(lines)
如果文件不是太大(相对于您的计算机有多少 RAM):
start = 'start'
end = 'end'
with open('test.txt','r') as f:
content = f.read()
result = content[content.index(start):content.index(end)]
然后您可以使用 print(result)
打印它,使用 result.split()
创建一个 list
,等等。
如果有多个start/stop点,and/or文件很大:
start = 'start'
end = 'end'
running = False
result = []
with open('test.txt','r') as f:
for line in f:
if start in line:
running = True
result.append(line)
elif end in line:
running = False
result.append(line)
elif running:
result.append(line)
这给您留下 list
,您可以 join()
、print()
、写入文件等。
文件对象是它自己的迭代器,你不需要while循环逐行读取文件,你可以迭代文件对象本身。要捕捉这些部分,只需在遇到带 start
的行时启动内循环,并在点击 end
:
时中断内循环
with open("in.txt") as f:
out = []
for line in f:
if start in line:
out.append(line)
for _line in f:
out.append(_line)
if end in _line:
break
输出:
['start\n', '4\n', '5\n', '6\n', 'end\n']
test.txt 会是
1
2
3
start
4
5
6
end
7
8
9
我希望结果是
start
4
5
6
end
这是我的代码
file = open('test.txt','r')
line = file.readline()
start_keyword = 'start'
end_keyword = 'end'
lines = []
while line:
line = file.readlines()
for words_in_line in line:
if start_keyword in words_in_line:
lines.append(words_in_line)
file.close()
print entities
它returns
['start\n']
我不知道要在上面的代码中添加什么才能达到我想要的结果。我一直在搜索和更改代码,但我不知道如何让它按我想要的方式工作。
您可以使用某种标志,当您遇到 start_keyword
时设置为真,如果设置了该标志,您将这些行添加到 lines
列表中,并且当您遇到 lines
时它会被取消设置遇到 end_keyword
(但仅在 end_keyword 已写入 lines
列表之后。
同时在 words_in_line
上使用 .strip()
删除 \n
(以及其他尾随和前导空格)如果您不希望它们出现在列表 lines
中,如果你确实想要它们,那就不要剥离它。
例子-
flag = False
for words_in_line in line:
if start_keyword in words_in_line:
flag = True
if flag:
lines.append(words_in_line.strip())
if end_keyword in words_in_line:
flag = False
请注意,这会将多个 start
到 end
块添加到 lines
列表中,我猜这就是您想要的。
使用旗帜。试试这个:
file = open('test.txt','r')
start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []
lines = file.readlines()
for line in lines:
line = line.strip()
if line == start_keyword:
in_range = True
elif line == end_keyword:
in_range = False
elif in_range:
entities.append(line)
file.close()
# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]
print entities
关于您的代码,请注意 readlines already reads all lines in a file, so calling readline doesn't seem to make much sense, unless you are ignoring the first line. Also use strip 从字符串中删除 EOL 字符。请注意您的代码如何没有按照您的预期执行:
# Reads ALL lines in the file as an array
line = file.readlines()
# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:
# If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
if start_keyword in words_in_line:
lines.append(words_in_line)
您需要一个状态变量来决定您是否存储这些行。这是一个简单的示例,它将始终存储该行,然后在您不想要的情况下改变主意并丢弃它:
start_keyword = 'start'
end_keyword = 'end'
lines = []
reading = False
with open('test.txt', 'r') as f:
for line in f:
lines.append(line)
if start_keyword in line:
reading = True
elif end_keyword in line:
reading = False
elif not reading:
lines.pop()
print ''.join(lines)
如果文件不是太大(相对于您的计算机有多少 RAM):
start = 'start'
end = 'end'
with open('test.txt','r') as f:
content = f.read()
result = content[content.index(start):content.index(end)]
然后您可以使用 print(result)
打印它,使用 result.split()
创建一个 list
,等等。
如果有多个start/stop点,and/or文件很大:
start = 'start'
end = 'end'
running = False
result = []
with open('test.txt','r') as f:
for line in f:
if start in line:
running = True
result.append(line)
elif end in line:
running = False
result.append(line)
elif running:
result.append(line)
这给您留下 list
,您可以 join()
、print()
、写入文件等。
文件对象是它自己的迭代器,你不需要while循环逐行读取文件,你可以迭代文件对象本身。要捕捉这些部分,只需在遇到带 start
的行时启动内循环,并在点击 end
:
with open("in.txt") as f:
out = []
for line in f:
if start in line:
out.append(line)
for _line in f:
out.append(_line)
if end in _line:
break
输出:
['start\n', '4\n', '5\n', '6\n', 'end\n']