在一个列表中标记我的 CSV,而不是使用 Python 分开
Tokenize my CSV in one list rather than separate using Python
我想在一个列表而不是单独的列表中标记我的 CSV?
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
print(tokens_train)
这就是我得到输出的方式:
['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']
我希望所有这些都在一个列表中
['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']
由于 sent_tokenize()
returns 一个列表,您每次都可以简单地扩展一个起始列表。
alltokens = []
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
alltokens.extend(tokens_train)
print(alltokens)
或者使用列表理解:
with open ('train.csv') as file_object:
alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)
即使 sent_tokenize()
returns 列表长于 1,这两种解决方案都有效。
初始化一个空列表
out = []
并在循环内向其追加项目。
out.append(tokens_train)
也许您也必须修改分词器。
我想在一个列表而不是单独的列表中标记我的 CSV?
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
print(tokens_train)
这就是我得到输出的方式:
['2.1 Separated of trains']
['Principle: The method to make the signal is different.']
['2.2 Context']
我希望所有这些都在一个列表中
['2.1 Separated of trains','Principle: The method to make the signal is different.','2.2 Context']
由于 sent_tokenize()
returns 一个列表,您每次都可以简单地扩展一个起始列表。
alltokens = []
with open ('train.csv') as file_object:
for trainline in file_object:
tokens_train = sent_tokenize(trainline)
alltokens.extend(tokens_train)
print(alltokens)
或者使用列表理解:
with open ('train.csv') as file_object:
alltokens = [token for trainline in file_object for token in sent_tokenize(trainline)]
print(alltokens)
即使 sent_tokenize()
returns 列表长于 1,这两种解决方案都有效。
初始化一个空列表
out = []
并在循环内向其追加项目。
out.append(tokens_train)
也许您也必须修改分词器。