用句子和标签分割 python 中的行

splitting lines in python with sentences and labels

I have a sample of a file with sentences and labels. How can it be split into sentences and labels?

一部非常非常非常缓慢、漫无目的的电影,讲述了一个苦恼、漂泊的年轻人。 0

不知道谁更失落——扁平人物还是观众,走了将近一半。 0

试图用黑白和巧妙的摄像机角度来制作艺术,这部电影令人失望 - 变得更加荒谬 - 因为表演很差,情节和台词几乎不存在。 0

很少听音乐,也没什么好说的。 0

输出
句子列表:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man','Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out']

对应标签:
['0','0']

'0'是标签吗?如果只有一个句子,您可以使用句点作为分隔符来执行 string.split('.')。尽管如果您的句子包含 'Mr.' 或 'Mrs.' 之类的内容,这可能会出现一些错误,因此您可能需要添加一些 if 语句来处理这些错误。

假设最后一个“.”(点)之后的数字是标签

对于存储在文件 'yourdata.txt' 中的给定示例,以下代码应生成 2 个列表 sentence_listlabel_list。您可以将这些列表中的数据分别写入文件,然后根据您的要求。

fmov=open('yourdata.txt','r')
sentence_list=[]
label_list=[]
for f in fmov.readlines():
    lineinfo=f.split('.')
    sentenceline=".".join(lineinfo[0:-1])
    sentence_list.append(sentenceline)
    label_list.append(str(lineinfo[-1]).replace('\n',''))
print(sentence_list)
print(label_list) 

OUT:
['A very, very, very slow-moving, aimless movie about a distressed, drifting young man', 'Not sure who was more lost - the flat characters or the audience, nearly half of whom walked out', 'Attempting artiness with black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent', 'Very little music or anything to speak of']
[' 0', ' 0', ' 0', ' 0']