字符串中不包括特殊字符的字符数
Count of characters in a string excluding special characters
我需要计算给定文件中的字符数。问题是,我没有正确分割文件。如果我的输入文件有内容 "The! dog-ate #####the,cat",我不需要输出中的特殊字符。
o/p: t:4 h:2 e:3 !:1 d:1 o:1 g:1 -:1 #:5.... 还有,我需要删除“-”符号并确保单词不会连接。
from collections import Counter
import sys
filename = sys.argv[1]
reg = '[^a-zA-Z+]'
f = open(filename, 'r')
x = f.read().strip()
lines=[]
for line in x:
line = line.strip().upper()
if line:
lines.append(line)
print(Counter(lines))
有人可以帮我解决这个问题吗?
只需删除不需要的值:
c = Counter(lines)
del c['#']
del c['-']
del c[',']
print(c)
使用re.sub
并删除特殊字符。
import re
with open(filename) as f:
content = re.sub('[^a-zA-Z]', '', f.read(), flags=re.M)
counts = Counter(content)
演示:
In [1]: re.sub('[^a-zA-Z]', '', "The! dog-ate #####the,cat")
Out[1]: 'Thedogatethecat'
In [2]: Counter(_)
Out[2]:
Counter({'T': 1,
'a': 2,
'c': 1,
'd': 1,
'e': 3,
'g': 1,
'h': 2,
'o': 1,
't': 3})
请注意,如果您想将大写字母和小写字母一起计算,可以将 content
转换为小写字母:
counts = Counter(content.lower())
foo.txt
asdas
!@#!@
asdljh
12j3l1k23j
发件人:
https://docs.python.org/3/library/string.html#string.ascii_letters
import string
from collections import Counter
with open('foo.txt') as f:
text = f.read()
filtered_text = [char for char in text if char in in string.ascii_letters]
counted = Counter(filtered_text)
print(counted.most_common())
输出
[('a', 3), ('j', 3), ('s', 3), ('d', 2), ('l', 2), ('h', 1), ('k', 1)]
我需要计算给定文件中的字符数。问题是,我没有正确分割文件。如果我的输入文件有内容 "The! dog-ate #####the,cat",我不需要输出中的特殊字符。 o/p: t:4 h:2 e:3 !:1 d:1 o:1 g:1 -:1 #:5.... 还有,我需要删除“-”符号并确保单词不会连接。
from collections import Counter
import sys
filename = sys.argv[1]
reg = '[^a-zA-Z+]'
f = open(filename, 'r')
x = f.read().strip()
lines=[]
for line in x:
line = line.strip().upper()
if line:
lines.append(line)
print(Counter(lines))
有人可以帮我解决这个问题吗?
只需删除不需要的值:
c = Counter(lines)
del c['#']
del c['-']
del c[',']
print(c)
使用re.sub
并删除特殊字符。
import re
with open(filename) as f:
content = re.sub('[^a-zA-Z]', '', f.read(), flags=re.M)
counts = Counter(content)
演示:
In [1]: re.sub('[^a-zA-Z]', '', "The! dog-ate #####the,cat")
Out[1]: 'Thedogatethecat'
In [2]: Counter(_)
Out[2]:
Counter({'T': 1,
'a': 2,
'c': 1,
'd': 1,
'e': 3,
'g': 1,
'h': 2,
'o': 1,
't': 3})
请注意,如果您想将大写字母和小写字母一起计算,可以将 content
转换为小写字母:
counts = Counter(content.lower())
foo.txt
asdas
!@#!@
asdljh
12j3l1k23j
发件人:
https://docs.python.org/3/library/string.html#string.ascii_letters
import string
from collections import Counter
with open('foo.txt') as f:
text = f.read()
filtered_text = [char for char in text if char in in string.ascii_letters]
counted = Counter(filtered_text)
print(counted.most_common())
输出
[('a', 3), ('j', 3), ('s', 3), ('d', 2), ('l', 2), ('h', 1), ('k', 1)]