Python 标记词
Python tokenizing words
summaries = []
texts = []
with open("C:\Users\apandey\Documents\Reviews.csv","r",encoding="utf8") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
clean_text = clean(row['Text'])
clean_summary = clean(row['Summary'])
summaries.append(word_tokenize(clean_summary))
texts.append(word_tokenize(clean_text))
我只想对 csv 文件中的行进行标记化,但出现此错误:
"list indices must be integers or slices, not str"
我认为您的 csv 文件看起来像这样:
Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
1,'B001E4KFG0','A3SGXH7AUHU8GW','delmartian',1,1,5,1303862400,'Good Quality Dog
Food','I have bought several of the Vitality canned dog food products and have
found them all to be of good quality...'
那么你应该按照 Peter Wood 在评论部分的建议使用 DictReader。
summaries = []
texts = []
with open("foo.csv",encoding="utf8", newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
clean_text = row["Text"]
clean_summary = row["Summary"]
summaries.append(word_tokenize(clean_summary))
texts.append(word_tokenize(clean_text))
输出:
# texts
[["'I", 'have', 'bought', 'several', 'of', 'the', 'Vitality', 'canned', 'dog', 'food', 'products', 'and', 'have', 'found', 'them', 'all', 'to', 'be', 'of', 'good', 'quality', '.', 'The', 'product', 'looks', 'more', 'like', 'a', 'stew', 'than', 'a', 'processed', 'meat', 'and', 'it', 'smells', 'better', '.', 'My', 'Labrador', 'is', 'finicky', 'and', 'she', 'appreciates', 'this', 'product', 'better', 'than', 'most', '.', "'"]]
# summaries
[["'Good", 'Quality', 'Dog', 'Food', "'"]]
summaries = []
texts = []
with open("C:\Users\apandey\Documents\Reviews.csv","r",encoding="utf8") as csvfile:
reader = csv.reader(csvfile)
for row in reader:
clean_text = clean(row['Text'])
clean_summary = clean(row['Summary'])
summaries.append(word_tokenize(clean_summary))
texts.append(word_tokenize(clean_text))
我只想对 csv 文件中的行进行标记化,但出现此错误: "list indices must be integers or slices, not str"
我认为您的 csv 文件看起来像这样:
Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
1,'B001E4KFG0','A3SGXH7AUHU8GW','delmartian',1,1,5,1303862400,'Good Quality Dog
Food','I have bought several of the Vitality canned dog food products and have
found them all to be of good quality...'
那么你应该按照 Peter Wood 在评论部分的建议使用 DictReader。
summaries = []
texts = []
with open("foo.csv",encoding="utf8", newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
clean_text = row["Text"]
clean_summary = row["Summary"]
summaries.append(word_tokenize(clean_summary))
texts.append(word_tokenize(clean_text))
输出:
# texts
[["'I", 'have', 'bought', 'several', 'of', 'the', 'Vitality', 'canned', 'dog', 'food', 'products', 'and', 'have', 'found', 'them', 'all', 'to', 'be', 'of', 'good', 'quality', '.', 'The', 'product', 'looks', 'more', 'like', 'a', 'stew', 'than', 'a', 'processed', 'meat', 'and', 'it', 'smells', 'better', '.', 'My', 'Labrador', 'is', 'finicky', 'and', 'she', 'appreciates', 'this', 'product', 'better', 'than', 'most', '.', "'"]]
# summaries
[["'Good", 'Quality', 'Dog', 'Food', "'"]]