如何仅使用循环和 string.strip() 来计算字符串中的单词数?
How do I count the words in a string using only loops and string.strip()?
这就是我目前拥有的功能。它应该读取一个文本文件和 return 总字数。我只能使用 for
循环、while
循环和 string.strip()
。由于某些奇怪的原因,它正在计算文本文件中的一些额外字符,包括换行符。这是文本文件之一:
Words make up other words.
This is a line.
Sequences of words make sentences.
I like words but I don't like MS Word.
There's another word for how I feel about MSWord: @#%&
在这个文本文件中,它总共有 33 个单词,但我的程序正在计算 34 个。每个句子都在自己的行上。第三行有两个前导空格;第五行有 4 个制表符。
def countWords(textFileName):
words = 0
for char in textFileName:
if char == " " or char == ".":
words = words + 1
if char != " " and char != ".":
pass
return words
def main():
textFileName = input("Enter textFileName: ")
total = 0
for line in open(textFileName):
total = total + countWords(line)
print(total, "words")
main()
由于您的单词是用空格分隔的,因此 split()
将适用于您。
检查这个:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def main():
textFileName = 'C:\temp\001.txt'
total = 0
for line in open(textFileName):
total += len(line.split())
print(total, "words")
main()
输出:
(33, 'words')
编辑:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def main():
textFileName = 'C:\temp\001.txt'
total = 0
for line in open(textFileName):
line = str.strip(line)
for char in line:
if char == ' ':
total += 1
total += 1
print(total, "words")
main()
输出:
(33, 'words')
str.strip()
也会删除制表符。
In[2]: a='\tabc'
In[3]: print a
abc
In[4]: str.strip(a)
Out[4]: 'abc'
如果你可以使用 split()
:
就很简单了
def count_words(s):
return len(s.split())
所以实现你自己的 split()
版本,像这样:
import string
def splitter(s, sep=string.whitespace):
words = []
word = []
for c in s:
if c not in sep:
word.append(c)
else:
if word:
words.append(''.join(word))
word = []
if word: # handle case of no sep at end of string
words.append(''.join(word))
return words
现在你可以重写count_words()
:
def count_words(s):
return len(splitter(s))
运行 在您的示例输入中:
>>> s = '''Words make up other words.
This is a line.
Sequences of words make sentences.
I like words but I don't like MS Word.
There's another word for how I feel about MSWord: @#%&'''
>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33
编辑: 不允许使用 append()
或 join()
:
def splitter(s, sep=string.whitespace):
words = []
word = ''
for c in s:
if c not in sep:
word += c
else:
if word:
words += [word]
word = ''
if word: # handle case of no sep at end of string
words += [word]
return words
def count_words(s):
count = 0
for word in splitter(s):
count += 1
return count
>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33
还有一个更直接的方法:
def count_words(s, sep=string.whitespace):
count = 0
in_word = False
for c in s:
if c not in sep:
if not in_word:
count += 1
in_word = True
else:
in_word = False
return count
这就是我目前拥有的功能。它应该读取一个文本文件和 return 总字数。我只能使用 for
循环、while
循环和 string.strip()
。由于某些奇怪的原因,它正在计算文本文件中的一些额外字符,包括换行符。这是文本文件之一:
Words make up other words.
This is a line.
Sequences of words make sentences.
I like words but I don't like MS Word.
There's another word for how I feel about MSWord: @#%&
在这个文本文件中,它总共有 33 个单词,但我的程序正在计算 34 个。每个句子都在自己的行上。第三行有两个前导空格;第五行有 4 个制表符。
def countWords(textFileName):
words = 0
for char in textFileName:
if char == " " or char == ".":
words = words + 1
if char != " " and char != ".":
pass
return words
def main():
textFileName = input("Enter textFileName: ")
total = 0
for line in open(textFileName):
total = total + countWords(line)
print(total, "words")
main()
由于您的单词是用空格分隔的,因此 split()
将适用于您。
检查这个:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def main():
textFileName = 'C:\temp\001.txt'
total = 0
for line in open(textFileName):
total += len(line.split())
print(total, "words")
main()
输出:
(33, 'words')
编辑:
#!/usr/bin/python
# -*- coding: utf-8 -*-
def main():
textFileName = 'C:\temp\001.txt'
total = 0
for line in open(textFileName):
line = str.strip(line)
for char in line:
if char == ' ':
total += 1
total += 1
print(total, "words")
main()
输出: (33, 'words')
str.strip()
也会删除制表符。
In[2]: a='\tabc'
In[3]: print a
abc
In[4]: str.strip(a)
Out[4]: 'abc'
如果你可以使用 split()
:
def count_words(s):
return len(s.split())
所以实现你自己的 split()
版本,像这样:
import string
def splitter(s, sep=string.whitespace):
words = []
word = []
for c in s:
if c not in sep:
word.append(c)
else:
if word:
words.append(''.join(word))
word = []
if word: # handle case of no sep at end of string
words.append(''.join(word))
return words
现在你可以重写count_words()
:
def count_words(s):
return len(splitter(s))
运行 在您的示例输入中:
>>> s = '''Words make up other words.
This is a line.
Sequences of words make sentences.
I like words but I don't like MS Word.
There's another word for how I feel about MSWord: @#%&'''
>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33
编辑: 不允许使用 append()
或 join()
:
def splitter(s, sep=string.whitespace):
words = []
word = ''
for c in s:
if c not in sep:
word += c
else:
if word:
words += [word]
word = ''
if word: # handle case of no sep at end of string
words += [word]
return words
def count_words(s):
count = 0
for word in splitter(s):
count += 1
return count
>>> splitter(s)
['Words', 'make', 'up', 'other', 'words.', 'This', 'is', 'a', 'line.', 'Sequences', 'of', 'words', 'make', 'sentences.', 'I', 'like', 'words', 'but', 'I', "don't", 'like', 'MS', 'Word.', "There's", 'another', 'word', 'for', 'how', 'I', 'feel', 'about', 'MSWord:', '@#%&']
>>> count_words(s)
33
还有一个更直接的方法:
def count_words(s, sep=string.whitespace):
count = 0
in_word = False
for c in s:
if c not in sep:
if not in_word:
count += 1
in_word = True
else:
in_word = False
return count