Converting a generator into a list, but getting Error: '_io.TextIOWrapper' object has no attribute 'decode' (python 3.6.4)
Converting a generator into a list, but getting Error: '_io.TextIOWrapper' object has no attribute 'decode' (python 3.6.4)
我正在处理 utf-8 格式的文本。
我想对其进行标记化,然后将其转换为列表。
但是我收到以下错误。
import nltk, jieba, re, os
with open('file.txt') as f:
tokenized_text = jieba.cut(f,cut_all=True)
type(tokenized_text)
generator
word_list = list(tokenized_text)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-16b25477c71d> in <module>()
----> 1 list(new)
~/anaconda3/lib/python3.6/site-packages/jieba/__init__.py in cut(self, sentence, cut_all, HMM)
280 - HMM: Whether to use the Hidden Markov Model.
281 '''
--> 282 sentence = strdecode(sentence)
283
284 if cut_all:
~/anaconda3/lib/python3.6/site-packages/jieba/_compat.py in strdecode(sentence)
35 if not isinstance(sentence, text_type):
36 try:
---> 37 sentence = sentence.decode('utf-8')
38 except UnicodeDecodeError:
39 sentence = sentence.decode('gbk', 'ignore')
AttributeError: '_io.TextIOWrapper' object has no attribute 'decode'
我知道问题出在 jieba 包的某个地方。
我也尝试将代码更改为
with open('file.txt') as f:
new = jieba.cut(f,cut_all=False)
但得到了相同的结果。
jieba.cut
接受一个字符串,而不是一个文件。 readme.
中对此进行了解释
我正在处理 utf-8 格式的文本。 我想对其进行标记化,然后将其转换为列表。 但是我收到以下错误。
import nltk, jieba, re, os
with open('file.txt') as f:
tokenized_text = jieba.cut(f,cut_all=True)
type(tokenized_text)
generator
word_list = list(tokenized_text)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-16b25477c71d> in <module>()
----> 1 list(new)
~/anaconda3/lib/python3.6/site-packages/jieba/__init__.py in cut(self, sentence, cut_all, HMM)
280 - HMM: Whether to use the Hidden Markov Model.
281 '''
--> 282 sentence = strdecode(sentence)
283
284 if cut_all:
~/anaconda3/lib/python3.6/site-packages/jieba/_compat.py in strdecode(sentence)
35 if not isinstance(sentence, text_type):
36 try:
---> 37 sentence = sentence.decode('utf-8')
38 except UnicodeDecodeError:
39 sentence = sentence.decode('gbk', 'ignore')
AttributeError: '_io.TextIOWrapper' object has no attribute 'decode'
我知道问题出在 jieba 包的某个地方。 我也尝试将代码更改为
with open('file.txt') as f:
new = jieba.cut(f,cut_all=False)
但得到了相同的结果。
jieba.cut
接受一个字符串,而不是一个文件。 readme.