使用字典跟踪文本文件中的字母
Tracking letters in text file with dictionaries
各位!
请考虑以下代码
运动体:
Read through a text file, line by line. Use a dict to keep track of
how many times each vowel (a, e, i, o, and u) appears in the file.
Print the resulting tabulation.
我的代码:
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
VOWELS = {}
occasions = 0
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in 'aeiou':
occasions += 1
VOWELS[letter] = occasions
return VOWELS
print(sum_int(filename)) **#returns {'o': 41, 'e': 38, 'a': 37, 'i': 40}
问题很明显:对文本中特定元音求和的结果根本不正确。
我的代码有什么问题?
试试这个。如果遇到该字母,则只需将字典值加 1 即可。因为可以引发 KeyError
,这意味着键值对不存在。因此,您可以初始化密钥。
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
VOWELS = {}
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in 'aeiou':
if letter in VOWELS:
VOWELS[letter] +=1
else:
VOWELS[letter]=1
return VOWELS
print(sum_int(filename))
您可以预先将所有元音设置为 0
:
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
vowels = 'aeiou'
result = {v: 0 for v in vowels}
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in vowels:
result[letter] += 1
return result
print(sum_int(filename))
输出:
{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
@YevhenKuzmovych 的评论完全正确。但让我放大他的评论并提出一些建议。
在你的循环中你有:
occasions += 1
每出现一个元音就会递增,从而保持所有元音的总数。将其用作特定元音的计数显然是错误的。我还将其重命名为 vowel_count
.
也不需要将输入拆分为单词并先对单词进行迭代,然后再对每个单词中的字母进行迭代。您可以迭代整个输入字符串中的所有字母。此外,传递给函数 sum_int
的内容(这个名称是什么意思?)不是需要打开的文件名,而是已经打开的流。因此我们有:
from io import StringIO
stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def count_vowels(stream):
vowels = dict(a=0, e=0, i=0, o=0, u=0)
#vowel_count = 0
s = stream.read()
for ch in s:
if ch in 'aeiou':
#vowel_count += 1
vowels[ch] += 1
return vowels
print(count_vowels(stream))
打印:
{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
或者您可以使用 collections.Counter
class:
from io import StringIO
from collections import Counter
stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def count_vowels(stream):
vowels = Counter()
#vowel_count = 0
s = stream.read()
for ch in s:
if ch in 'aeiou':
#vowel_count += 1
vowels[ch] += 1
return vowels
counts = count_vowels(stream)
for vowel in 'aeiou':
print (vowel, '->', counts[vowel])
打印:
a -> 9
e -> 17
i -> 5
o -> 10
u -> 0
备注
s
是整个字符串,ch
代表字符串中的每个字符,无论是字母还是 space 或句号等标点符号。所以你正在检查每个字符并只选择元音。
首先使用split
将字符串分解成准或伪字是低效的。我说 quasi 词是因为在去掉白色 space 之后你最终得到的不是真正的词,因为你仍然在一些词上附加了标点符号。此外 split
只是删除 spaces 并最终创建这些准单词的列表并占用额外的时间和 space 来执行此操作(如果您的输入字符串,这不是一个大问题不是太大但会产生不必要的额外开销,尤其是对于大输入)。然后你被迫执行一个双循环,首先在每个准词上,然后在准词中的每个字符上。这不如对初始字符串中的每个字符执行单个循环那样有效。
各位!
请考虑以下代码
运动体:
Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a, e, i, o, and u) appears in the file. Print the resulting tabulation.
我的代码:
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
VOWELS = {}
occasions = 0
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in 'aeiou':
occasions += 1
VOWELS[letter] = occasions
return VOWELS
print(sum_int(filename)) **#returns {'o': 41, 'e': 38, 'a': 37, 'i': 40}
问题很明显:对文本中特定元音求和的结果根本不正确。
我的代码有什么问题?
试试这个。如果遇到该字母,则只需将字典值加 1 即可。因为可以引发 KeyError
,这意味着键值对不存在。因此,您可以初始化密钥。
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
VOWELS = {}
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in 'aeiou':
if letter in VOWELS:
VOWELS[letter] +=1
else:
VOWELS[letter]=1
return VOWELS
print(sum_int(filename))
您可以预先将所有元音设置为 0
:
from io import StringIO
filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def sum_int(filename):
vowels = 'aeiou'
result = {v: 0 for v in vowels}
filename = filename.read().split()
for word in filename:
for letter in word:
if letter in vowels:
result[letter] += 1
return result
print(sum_int(filename))
输出:
{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
@YevhenKuzmovych 的评论完全正确。但让我放大他的评论并提出一些建议。
在你的循环中你有:
occasions += 1
每出现一个元音就会递增,从而保持所有元音的总数。将其用作特定元音的计数显然是错误的。我还将其重命名为 vowel_count
.
也不需要将输入拆分为单词并先对单词进行迭代,然后再对每个单词中的字母进行迭代。您可以迭代整个输入字符串中的所有字母。此外,传递给函数 sum_int
的内容(这个名称是什么意思?)不是需要打开的文件名,而是已经打开的流。因此我们有:
from io import StringIO
stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def count_vowels(stream):
vowels = dict(a=0, e=0, i=0, o=0, u=0)
#vowel_count = 0
s = stream.read()
for ch in s:
if ch in 'aeiou':
#vowel_count += 1
vowels[ch] += 1
return vowels
print(count_vowels(stream))
打印:
{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
或者您可以使用 collections.Counter
class:
from io import StringIO
from collections import Counter
stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')
def count_vowels(stream):
vowels = Counter()
#vowel_count = 0
s = stream.read()
for ch in s:
if ch in 'aeiou':
#vowel_count += 1
vowels[ch] += 1
return vowels
counts = count_vowels(stream)
for vowel in 'aeiou':
print (vowel, '->', counts[vowel])
打印:
a -> 9
e -> 17
i -> 5
o -> 10
u -> 0
备注
s
是整个字符串,ch
代表字符串中的每个字符,无论是字母还是 space 或句号等标点符号。所以你正在检查每个字符并只选择元音。
首先使用split
将字符串分解成准或伪字是低效的。我说 quasi 词是因为在去掉白色 space 之后你最终得到的不是真正的词,因为你仍然在一些词上附加了标点符号。此外 split
只是删除 spaces 并最终创建这些准单词的列表并占用额外的时间和 space 来执行此操作(如果您的输入字符串,这不是一个大问题不是太大但会产生不必要的额外开销,尤其是对于大输入)。然后你被迫执行一个双循环,首先在每个准词上,然后在准词中的每个字符上。这不如对初始字符串中的每个字符执行单个循环那样有效。