使用字典跟踪文本文件中的字母

Question

各位！

请考虑以下代码

运动体：

Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a, e, i, o, and u) appears in the file. Print the resulting tabulation.

我的代码：

from io import StringIO

filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')

def sum_int(filename):
    VOWELS = {}
    occasions = 0
    filename = filename.read().split()
    for word in filename:
        for letter in word:
            if letter in 'aeiou':
                occasions += 1
                VOWELS[letter] = occasions
    return VOWELS

print(sum_int(filename)) **#returns {'o': 41, 'e': 38, 'a': 37, 'i': 40}

问题很明显：对文本中特定元音求和的结果根本不正确。

我的代码有什么问题？

Answer 1

试试这个。如果遇到该字母，则只需将字典值加 1 即可。因为可以引发 KeyError，这意味着键值对不存在。因此，您可以初始化密钥。

from io import StringIO

filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')

def sum_int(filename):
    VOWELS = {}
    
    filename = filename.read().split()
    for word in filename:
        for letter in word:
            if letter in 'aeiou':
                if letter in VOWELS:
                    VOWELS[letter] +=1
                else:
                    VOWELS[letter]=1
    return VOWELS

print(sum_int(filename))

Answer 2

您可以预先将所有元音设置为 0：

from io import StringIO

filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')


def sum_int(filename):
    vowels = 'aeiou'
    result = {v: 0 for v in vowels}

    filename = filename.read().split()
    for word in filename:
        for letter in word:
            if letter in vowels:
                result[letter] += 1
    return result


print(sum_int(filename))

输出：

{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}

Answer 3

@YevhenKuzmovych 的评论完全正确。但让我放大他的评论并提出一些建议。

在你的循环中你有：

occasions += 1

每出现一个元音就会递增，从而保持所有元音的总数。将其用作特定元音的计数显然是错误的。我还将其重命名为 vowel_count.

也不需要将输入拆分为单词并先对单词进行迭代，然后再对每个单词中的字母进行迭代。您可以迭代整个输入字符串中的所有字母。此外，传递给函数 sum_int 的内容（这个名称是什么意思？）不是需要打开的文件名，而是已经打开的流。因此我们有：

from io import StringIO

stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')

def count_vowels(stream):
    vowels = dict(a=0, e=0, i=0, o=0, u=0)
    #vowel_count = 0
    s = stream.read()
    for ch in s:
        if ch in 'aeiou':
            #vowel_count += 1
            vowels[ch] += 1
    return vowels
print(count_vowels(stream))

打印：

{'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}

或者您可以使用 collections.Counter class:

from io import StringIO
from collections import Counter

stream = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')

def count_vowels(stream):
    vowels = Counter()
    #vowel_count = 0
    s = stream.read()
    for ch in s:
        if ch in 'aeiou':
            #vowel_count += 1
            vowels[ch] += 1
    return vowels
counts = count_vowels(stream)
for vowel in 'aeiou':
    print (vowel, '->', counts[vowel])

打印：

a -> 9
e -> 17
i -> 5
o -> 10
u -> 0

备注

s 是整个字符串，ch 代表字符串中的每个字符，无论是字母还是 space 或句号等标点符号。所以你正在检查每个字符并只选择元音。

首先使用split将字符串分解成准或伪字是低效的。我说 quasi 词是因为在去掉白色 space 之后你最终得到的不是真正的词，因为你仍然在一些词上附加了标点符号。此外 split 只是删除 spaces 并最终创建这些准单词的列表并占用额外的时间和 space 来执行此操作（如果您的输入字符串，这不是一个大问题不是太大但会产生不必要的额外开销，尤其是对于大输入）。然后你被迫执行一个双循环，首先在每个准词上，然后在准词中的每个字符上。这不如对初始字符串中的每个字符执行单个循环那样有效。

使用字典跟踪文本文件中的字母

Tracking letters in text file with dictionaries

python

iteration

dictionary

loops

file