Python - 计算字符串中的字母频率
Python - Counting Letter Frequency in a String
我想写下每个字符串的字母频率。我的输入和预期输出是这样的。
"aaaa" -> "a4"
"abb" -> "a1b2"
"abbb cc a" -> "a1b3 c2 a1"
"bbbaaacddddee" -> "b3a3c1d4e2"
"a b" -> "a1 b1"
我找到了 this solution 但它以随机顺序给出了频率。我该怎么做?
这是否满足您的需求?
from itertools import groupby
s = "bbbaaac ddddee aa"
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
res1 = "".join("{}{}".format(label, count) for label, count in result)
# 'b3a3c1 1d4e2 1a2'
# spaces just as spaces, do not include their count
import re
re.sub(' [0-9]+', ' ', res1)
'b3a3c1 d4e2 a2'
对我来说,乍一看有点棘手。例如,看起来"bbbaaacddddee" -> "b3a3c1d4e2"
确实需要按照传入字符串中出现的顺序输出计数结果:
import re
def unique_elements(t):
l = []
for w in t:
if w not in l:
l.append(w)
return l
def splitter(s):
res = []
tokens = re.split("[ ]+", s)
for token in tokens:
s1 = unique_elements(token) # or s1 = sorted(set(token))
this_count = "".join([k + str(v) for k, v in list(zip(s1, [token.count(x) for x in s1]))])
res.append(this_count)
return " ".join(res)
print(splitter("aaaa"))
print(splitter("abb"))
print(splitter("abbb cc a"))
print(splitter("bbbaaacddddee"))
print(splitter("a b"))
输出
a4
a1b2
a1b3 c2 a1
b3a3c1d4e2
a1 b1
如果出现的顺序不是真正的交易,您可以忽略 unique_elements
功能,只需在 splitter
中替换 s1 = sorted(set(token))
之类的内容,如评论中所示。
here is you answer
test_str = "here is your answer"
res = {}
list=[]
list=test_str.split()
# print(list)
for a in list:
res={}
for keys in a:
res[keys] = res.get(keys, 0) + 1
for key,value in res.items():
print(f"{key}{value}",end="")
print(end=" ")
无需遍历每个单词中的每个字符。
这是一个替代解决方案。 (如果你不想使用 itertools,那看起来很整洁。)
def word_stats(data: str=""):
all = []
for word in data.split(" "):
res = []
while len(word)>0:
res.append(word[:1] + str(word.count(word[:1])))
word = word.replace(word[:1],"")
res.sort()
all.append("".join(res))
return " ".join(all)
print(word_stats("asjssjbjbbhsiaiic ifiaficjxzjooro qoprlllkskrmsnm mmvvllvlxjxj jfnnfcncnnccnncsllsdfi"))
print(word_stats("abbb cc a"))
print(word_stats("bbbaaacddddee"))
这将输出:
c5d1f3i1j1l2n7s2
a1b3 c2 a1
a3b3c1d4e2
我想写下每个字符串的字母频率。我的输入和预期输出是这样的。
"aaaa" -> "a4"
"abb" -> "a1b2"
"abbb cc a" -> "a1b3 c2 a1"
"bbbaaacddddee" -> "b3a3c1d4e2"
"a b" -> "a1 b1"
我找到了 this solution 但它以随机顺序给出了频率。我该怎么做?
这是否满足您的需求?
from itertools import groupby
s = "bbbaaac ddddee aa"
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
res1 = "".join("{}{}".format(label, count) for label, count in result)
# 'b3a3c1 1d4e2 1a2'
# spaces just as spaces, do not include their count
import re
re.sub(' [0-9]+', ' ', res1)
'b3a3c1 d4e2 a2'
对我来说,乍一看有点棘手。例如,看起来"bbbaaacddddee" -> "b3a3c1d4e2"
确实需要按照传入字符串中出现的顺序输出计数结果:
import re
def unique_elements(t):
l = []
for w in t:
if w not in l:
l.append(w)
return l
def splitter(s):
res = []
tokens = re.split("[ ]+", s)
for token in tokens:
s1 = unique_elements(token) # or s1 = sorted(set(token))
this_count = "".join([k + str(v) for k, v in list(zip(s1, [token.count(x) for x in s1]))])
res.append(this_count)
return " ".join(res)
print(splitter("aaaa"))
print(splitter("abb"))
print(splitter("abbb cc a"))
print(splitter("bbbaaacddddee"))
print(splitter("a b"))
输出
a4
a1b2
a1b3 c2 a1
b3a3c1d4e2
a1 b1
如果出现的顺序不是真正的交易,您可以忽略 unique_elements
功能,只需在 splitter
中替换 s1 = sorted(set(token))
之类的内容,如评论中所示。
here is you answer
test_str = "here is your answer"
res = {}
list=[]
list=test_str.split()
# print(list)
for a in list:
res={}
for keys in a:
res[keys] = res.get(keys, 0) + 1
for key,value in res.items():
print(f"{key}{value}",end="")
print(end=" ")
无需遍历每个单词中的每个字符。
这是一个替代解决方案。 (如果你不想使用 itertools,那看起来很整洁。)
def word_stats(data: str=""):
all = []
for word in data.split(" "):
res = []
while len(word)>0:
res.append(word[:1] + str(word.count(word[:1])))
word = word.replace(word[:1],"")
res.sort()
all.append("".join(res))
return " ".join(all)
print(word_stats("asjssjbjbbhsiaiic ifiaficjxzjooro qoprlllkskrmsnm mmvvllvlxjxj jfnnfcncnnccnncsllsdfi"))
print(word_stats("abbb cc a"))
print(word_stats("bbbaaacddddee"))
这将输出:
c5d1f3i1j1l2n7s2
a1b3 c2 a1
a3b3c1d4e2