给定所有两个连续单词出现的文本计数
Given a text count occurrences of all two consecutive words
输入:
Once upon a time a time this upon a
输出:
dictionary {
'Once upon': 1,
'upon a': 2,
'a time': 2,
'time a': 1,
'time this': 1,
'this upon': 1
}
代码:
def countTuples(path):
dic = dict()
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
我收到这个错误:
File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
TypeError: list indices must be integers or slices, not str
如果我删除 +=
并只放置 =1
一切正常,我想问题是当我尝试访问一个条目以提取一个尚不存在的值时?
我该怎么做才能解决这个问题?
一种需要对代码进行最少更改的解决方案是仅使用 defaultdict
:
from collections import defaultdict
line = 'Once upon a time a time this upon a'
dic = defaultdict(int)
s = line.split()
for i in range(0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
这会产生:
dic
defaultdict(int,
{'Once upon': 1,
'a time': 2,
'this upon': 1,
'time a': 1,
'time this': 1,
'upon a': 2})
你的函数就变成了:
def countTuples(path):
dic = defaultdict(int)
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
您可以使用 defaultdict
使您的解决方案有效。使用 defaultdict
,您可以指定键值对值的默认类型。这允许您对尚未显式创建的密钥进行 +=1
之类的分配:
import codecs
from collections import defaultdict
def countTuples(path):
dic = defaultdict(int)
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
>>> {'Once upon': 1,
'a time': 2,
'this upon': 1,
'time a': 1,
'time this': 1,
'upon a': 2})
没必要让它变得那么难,只需使用 Counter
并使用 zip
将二元语法输入计数器,例如:
<b>from collections import Counter</b>
def countTuples(path):
dic = <b>Counter()</b>
with codecs.open(path, 'r', 'utf-8') as f
for line in f:
s = line.split()
<b>dic.update('%s %s'%t for t in zip(s,s[1:]))</b>
return dic
输入:
Once upon a time a time this upon a
输出:
dictionary {
'Once upon': 1,
'upon a': 2,
'a time': 2,
'time a': 1,
'time this': 1,
'this upon': 1
}
代码:
def countTuples(path):
dic = dict()
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
我收到这个错误:
File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
TypeError: list indices must be integers or slices, not str
如果我删除 +=
并只放置 =1
一切正常,我想问题是当我尝试访问一个条目以提取一个尚不存在的值时?
我该怎么做才能解决这个问题?
一种需要对代码进行最少更改的解决方案是仅使用 defaultdict
:
from collections import defaultdict
line = 'Once upon a time a time this upon a'
dic = defaultdict(int)
s = line.split()
for i in range(0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
这会产生:
dic
defaultdict(int,
{'Once upon': 1,
'a time': 2,
'this upon': 1,
'time a': 1,
'time this': 1,
'upon a': 2})
你的函数就变成了:
def countTuples(path):
dic = defaultdict(int)
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
您可以使用 defaultdict
使您的解决方案有效。使用 defaultdict
,您可以指定键值对值的默认类型。这允许您对尚未显式创建的密钥进行 +=1
之类的分配:
import codecs
from collections import defaultdict
def countTuples(path):
dic = defaultdict(int)
with codecs.open(path, 'r', 'utf-8') as f:
for line in f:
s = line.split()
for i in range (0, len(s)-1):
dic[str(s[i]) + ' ' + str(s[i+1])] += 1
return dic
>>> {'Once upon': 1,
'a time': 2,
'this upon': 1,
'time a': 1,
'time this': 1,
'upon a': 2})
没必要让它变得那么难,只需使用 Counter
并使用 zip
将二元语法输入计数器,例如:
<b>from collections import Counter</b>
def countTuples(path):
dic = <b>Counter()</b>
with codecs.open(path, 'r', 'utf-8') as f
for line in f:
s = line.split()
<b>dic.update('%s %s'%t for t in zip(s,s[1:]))</b>
return dic