构建用于分析的单词计数器
Building a Word Counter for Analysis
我正在尝试构建一个类似于 wordcounter.net (https://wordcounter.net/) 的 Python 程序。
我有一个 excel 文件,其中一列包含要分析的文本。使用pandas和其他功能,我创建了一个词频计数器。
但是现在,我需要进一步修改以找到模式。
比如某文有“笑脸愁眉苦脸圆润小宝贝甜”
开心脸圆润悲伤脸圆润"
所以在这里,它应该能够追踪模式,例如
二字密度
模式 计数
“笑脸”2
“悲伤的脸”2
“脸圆润”3
.....
三字密度
模式 计数
“喜怒哀乐”1
“一脸悲伤”1
.....
我也试过了:
for match in re.finditer(pattern, line):
但这又必须手动完成,我希望它能自动找到模式。
任何人都可以帮助解决这个问题吗?
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
d = {}
for s in text.split():
d.setdefault(s, 0)
d[s] += 1
out = {}
for k, v in d.items():
out.setdefault(v, []).append(k)
for i in sorted(out.keys(), reverse=True):
print(f'{i} word density:')
print(f'\t{out[i]}')
输出
5 word density:
['face']
3 word density:
['mellow']
2 word density:
['Happy', 'sad']
1 word density:
['little', 'baby', 'sweet']
编辑2
from collections import Counter
def freq(lst, n):
lstn = []
for i in range(len(lst) - (n - 1)):
lstn.append(" ".join([lst[i + x] for x in range(n)]))
out = Counter(lstn)
print(f'{n} word density:')
for k, v in out.items():
print(f'\t"{k}" {v}')
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
lst = text.split()
freq(lst, 2)
freq(lst, 3)
输出
2 word density:
"Happy face" 2
"face sad" 1
"sad face" 2
"face mellow" 3
"mellow little" 1
"little baby" 1
"baby sweet" 1
"sweet Happy" 1
"face face" 1
"mellow sad" 1
3 word density:
"Happy face sad" 1
"face sad face" 1
"sad face mellow" 2
"face mellow little" 1
"mellow little baby" 1
"little baby sweet" 1
"baby sweet Happy" 1
"sweet Happy face" 1
"Happy face face" 1
"face face mellow" 1
"face mellow sad" 1
"mellow sad face" 1
我正在尝试构建一个类似于 wordcounter.net (https://wordcounter.net/) 的 Python 程序。 我有一个 excel 文件,其中一列包含要分析的文本。使用pandas和其他功能,我创建了一个词频计数器。
但是现在,我需要进一步修改以找到模式。
比如某文有“笑脸愁眉苦脸圆润小宝贝甜” 开心脸圆润悲伤脸圆润"
所以在这里,它应该能够追踪模式,例如 二字密度
模式 计数
“笑脸”2
“悲伤的脸”2
“脸圆润”3
.....
三字密度
模式 计数
“喜怒哀乐”1
“一脸悲伤”1
.....
我也试过了:
for match in re.finditer(pattern, line):
但这又必须手动完成,我希望它能自动找到模式。
任何人都可以帮助解决这个问题吗?
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
d = {}
for s in text.split():
d.setdefault(s, 0)
d[s] += 1
out = {}
for k, v in d.items():
out.setdefault(v, []).append(k)
for i in sorted(out.keys(), reverse=True):
print(f'{i} word density:')
print(f'\t{out[i]}')
输出
5 word density:
['face']
3 word density:
['mellow']
2 word density:
['Happy', 'sad']
1 word density:
['little', 'baby', 'sweet']
编辑2
from collections import Counter
def freq(lst, n):
lstn = []
for i in range(len(lst) - (n - 1)):
lstn.append(" ".join([lst[i + x] for x in range(n)]))
out = Counter(lstn)
print(f'{n} word density:')
for k, v in out.items():
print(f'\t"{k}" {v}')
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
lst = text.split()
freq(lst, 2)
freq(lst, 3)
输出
2 word density:
"Happy face" 2
"face sad" 1
"sad face" 2
"face mellow" 3
"mellow little" 1
"little baby" 1
"baby sweet" 1
"sweet Happy" 1
"face face" 1
"mellow sad" 1
3 word density:
"Happy face sad" 1
"face sad face" 1
"sad face mellow" 2
"face mellow little" 1
"mellow little baby" 1
"little baby sweet" 1
"baby sweet Happy" 1
"sweet Happy face" 1
"Happy face face" 1
"face face mellow" 1
"face mellow sad" 1
"mellow sad face" 1