如何仅使用循环和 string.strip() 来计算字符串中的单词数？

Question

这就是我目前拥有的功能。它应该读取一个文本文件和 return 总字数。我只能使用 for 循环、while 循环和 string.strip()。由于某些奇怪的原因，它正在计算文本文件中的一些额外字符，包括换行符。这是文本文件之一：

Words make up other words. This is a line. Sequences of words make sentences. I like words but I don't like MS Word. There's another word for how I feel about MSWord: @#%&

在这个文本文件中，它总共有 33 个单词，但我的程序正在计算 34 个。每个句子都在自己的行上。第三行有两个前导空格；第五行有 4 个制表符。

def countWords(textFileName):
    words = 0
    for char in textFileName:
        if char == " " or char == ".":
            words = words + 1
        if char != " " and char != ".":
            pass
    return words


def main():
    textFileName = input("Enter textFileName: ")
    total = 0
    for line in open(textFileName):
        total = total + countWords(line)
    print(total, "words")
main()

Answer 1

由于您的单词是用空格分隔的，因此 split() 将适用于您。检查这个：

#!/usr/bin/python
# -*- coding: utf-8 -*-

def main():
    textFileName = 'C:\temp\001.txt'
    total = 0
    for line in open(textFileName):

        total += len(line.split())
    print(total, "words")
main()

输出：

(33, 'words')

编辑：

#!/usr/bin/python
# -*- coding: utf-8 -*-

def main():
    textFileName = 'C:\temp\001.txt'
    total = 0
    for line in open(textFileName):
        line = str.strip(line)
        for char in line:
            if char == ' ':
                total += 1
        total += 1
    print(total, "words")
main()

输出： (33, 'words')

str.strip() 也会删除制表符。

In[2]: a='\tabc'
In[3]: print a
    abc
In[4]: str.strip(a)
Out[4]: 'abc'

Answer 2

如果你可以使用 split():

就很简单了

def count_words(s):
    return len(s.split())

所以实现你自己的 split() 版本，像这样：

import string

def splitter(s, sep=string.whitespace):
    words = []
    word = []
    for c in s:
        if c not in sep:
            word.append(c)
        else:
            if word:
                words.append(''.join(word))
                word = []
    if word:    # handle case of no sep at end of string
        words.append(''.join(word))
    return words

现在你可以重写count_words():

def count_words(s):
    return len(splitter(s))

运行在您的示例输入中：

>>> s = '''Words make up other words. 
This is a line.
  Sequences of words make sentences.
I like words but I don't like MS Word.
    There's another word for how I feel about MSWord: @#%&'''

>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33

编辑: 不允许使用 append() 或 join():

def splitter(s, sep=string.whitespace):
    words = []
    word = ''
    for c in s:
        if c not in sep:
            word += c
        else:
            if word:
                words += [word]
                word = ''
    if word:    # handle case of no sep at end of string
        words += [word]
    return words

def count_words(s):
    count = 0
    for word in splitter(s):
        count += 1
    return count

>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33

还有一个更直接的方法：

def count_words(s, sep=string.whitespace):
    count = 0 
    in_word = False
    for c in s:
        if c not in sep:
            if not in_word:
                count += 1
                in_word = True
        else:
            in_word = False
    return count

如何仅使用循环和 string.strip() 来计算字符串中的单词数？

How do I count the words in a string using only loops and string.strip()?

python

string

count