如何在一行中找到一个子字符串并从该行追加到下一个子字符串?

How to find a substring in a line and append from that line up to the next substring?

test.txt 会是

1
2
3
start
4
5
6
end
7
8
9

我希望结果是

start
4
5
6
end

这是我的代码

file = open('test.txt','r')

line = file.readline()

start_keyword = 'start'
end_keyword = 'end'

lines = []

while line: 
    line = file.readlines() 
    for words_in_line in line: 
        if start_keyword in words_in_line:
            lines.append(words_in_line)

file.close()

print entities

它returns

['start\n']

我不知道要在上面的代码中添加什么才能达到我想要的结果。我一直在搜索和更改代码,但我不知道如何让它按我想要的方式工作。

您可以使用某种标志,当您遇到 start_keyword 时设置为真,如果设置了该标志,您将这些行添加到 lines 列表中,并且当您遇到 lines 时它会被取消设置遇到 end_keyword(但仅在 end_keyword 已写入 lines 列表之后。

同时在 words_in_line 上使用 .strip() 删除 \n (以及其他尾随和前导空格)如果您不希望它们出现在列表 lines 中,如果你确实想要它们,那就不要剥离它。

例子-

flag = False
for words_in_line in line: 
    if start_keyword in words_in_line:
        flag = True
    if flag:
        lines.append(words_in_line.strip())
    if end_keyword in words_in_line:
        flag = False

请注意,这会将多个 startend 块添加到 lines 列表中,我猜这就是您想要的。

使用旗帜。试试这个:

file = open('test.txt','r')

start_keyword = 'start'
end_keyword = 'end'
in_range = False
entities = []

lines = file.readlines()

for line in lines:

    line = line.strip()

    if line == start_keyword:
        in_range = True
    elif line == end_keyword:
        in_range = False

    elif in_range:
        entities.append(line)

file.close()

# If you want to include the start/end tags
#entities = [start_keyword] + entities + [end_keyword]

print entities

关于您的代码,请注意 readlines already reads all lines in a file, so calling readline doesn't seem to make much sense, unless you are ignoring the first line. Also use strip 从字符串中删除 EOL 字符。请注意您的代码如何没有按照您的预期执行:

# Reads ALL lines in the file as an array
line = file.readlines() 

# You are not iterating words in a line, but rather all lines one by one
for words_in_line in line:

    # If a given line contains 'start', append it. This is why you only get ['start\n'], it's the only line you are adding as no other line contains that string
    if start_keyword in words_in_line:
        lines.append(words_in_line)

您需要一个状态变量来决定您是否存储这些行。这是一个简单的示例,它将始终存储该行,然后在您不想要的情况下改变主意并丢弃它:

start_keyword = 'start'
end_keyword = 'end'

lines = []
reading = False
with open('test.txt', 'r') as f:
    for line in f:
        lines.append(line)
        if start_keyword in line:
            reading = True
        elif end_keyword in line:
            reading = False
        elif not reading:
            lines.pop()

print ''.join(lines)

如果文件不是太大(相对于您的计算机有多少 RAM):

start = 'start'
end = 'end'

with open('test.txt','r') as f:
    content = f.read()
    result = content[content.index(start):content.index(end)]

然后您可以使用 print(result) 打印它,使用 result.split() 创建一个 list,等等。

如果有多个start/stop点,and/or文件很大:

start = 'start'
end = 'end'
running = False
result = []

with open('test.txt','r') as f:
    for line in f:
        if start in line:
            running = True
            result.append(line)
        elif end in line:
            running = False
            result.append(line)
        elif running:
            result.append(line)

这给您留下 list,您可以 join()print()、写入文件等。

文件对象是它自己的迭代器,你不需要while循环逐行读取文件,你可以迭代文件对象本身。要捕捉这些部分,只需在遇到带 start 的行时启动内循环,并在点击 end:

时中断内循环
with open("in.txt") as f:
    out = []
    for line in f:
        if start in line:
            out.append(line)
            for _line in f:
                out.append(_line)
                if end in  _line:
                    break 

输出:

['start\n', '4\n', '5\n', '6\n', 'end\n']