如何在换行符后迭代字符串?

How to iterate over strings after newline?

我正在尝试将这些句子相互比较。例如,我想看看 BEFORE 是否与 BEFORE THE 相同,但显然不是。但是,问题是我正在尝试循环换行,所以

BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS 只会出现在一个字符串中。下面是一个示例文件。

BEFORE

BEFORE THE

BEFORE THE PARLIAMENT

BEFORE THE PARLIAMENT ON

BEFORE THE PARLIAMENT ON
BRITAIN'S

BEFORE THE PARLIAMENT ON
BRITAIN'S RELATIONS

BEFORE THE PARLIAMENT ON
BRITAIN'S RELATIONS WITH

我现在的做法是循环遍历每一行。因此,当句子超过一行时,它会拆分所有内容。

with open("test.txt") as f:
    data = f.readlines()
    data = [d.strip().split('\n') for d in data]

我怎样才能遍历这个文件,一个一个地获取每个句子,而不是遍历每一行?

你可以用双换行符分割:

data = f.read().split('\n\n')

但是,您必须确保空行不包含任何字符(空格)。

在双换行符上拆分,例如:

with open("test.txt") as f:
    data = f.read()
    data = [d.strip().split('\n\n') for d in data]
with open("test.txt") as f:
    text = f.read()
    for line in text.split("\n\n"):
        line = line.replace("\n", " ")
        print(line)

我想这就是你想要的。您可以按双换行符拆分,然后用空格替换换行符。

输出:

BEFORE
BEFORE THE
BEFORE THE PARLIAMENT
BEFORE THE PARLIAMENT ON
BEFORE THE PARLIAMENT ON BRITAIN'S
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS
BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH

一个版本使用 itertools.groupby。这将适用于句子之间任意数量的换行符:

from itertools import groupby

with open('file.txt', 'r') as f_in:
    txt = f_in.read()

out = []
for v, g in groupby(txt.splitlines(), lambda k: k != ''):
    if v:
        out.append(' '.join(g))


from pprint import pprint
pprint(out)

打印:

['BEFORE',
 'BEFORE THE',
 'BEFORE THE PARLIAMENT',
 'BEFORE THE PARLIAMENT ON',
 "BEFORE THE PARLIAMENT ON BRITAIN'S",
 "BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS",
 "BEFORE THE PARLIAMENT ON BRITAIN'S RELATIONS WITH"]