在一个列表中标记我的 CSV,而不是使用 Python 分开

Tokenize my CSV in one list rather than separate using Python

我想在一个列表而不是单独的列表中标记我的 CSV?

with open ('train.csv') as file_object:
    for trainline in file_object:
        tokens_train = sent_tokenize(trainline)
        print(tokens_train)

这就是我得到输出的方式:

['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']

我希望所有这些都在一个列表中

['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']

由于 sent_tokenize() returns 一个列表,您每次都可以简单地扩展一个起始列表。

alltokens = []

with open ('train.csv') as file_object:
    for trainline in file_object:
        tokens_train = sent_tokenize(trainline)
        alltokens.extend(tokens_train)
    print(alltokens)

或者使用列表理解:

with open ('train.csv') as file_object:
    alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)

即使 sent_tokenize() returns 列表长于 1,这两种解决方案都有效。

初始化一个空列表

out = []

并在循环内向其追加项目。

out.append(tokens_train)

也许您也必须修改分词器。