如何统计单词的出现频率并添加单词在列表列表中的关联权重
How to count the frequency of words and add the associated weight of the words in a list of lists
我有以下数据
[[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
我需要以下输出
[ABC, 2, 7]
[BCD, 4, 13]
[CDE, 1, 3]
[DEF, 1, 3]
我需要计算位置 [1] 处的单词数,然后计算位置 [0] 处单词的数量。结果是
[Word, freq, sum of weight]
我检查了 and 但他们无法解决我的问题。
我试过了但没有成功
res = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
d = {}
for freq, label in res:
if label not in d:
d[label] = {}
inner_dict = d[label]
if freq not in inner_dict:
inner_dict[freq] = 0
inner_dict[freq] += freq
print(inner_dict)
试试这个:
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
result = {}
for weight, value in data:
if value not in result:
result[value] = [1, weight]
else:
result[value][0] += 1
result[value][1] += weight
print(result)
结果:
{'ABC': [2, 7], 'BCD': [4, 13], 'CDE': [1, 3], 'DEF': [1, 3]}
你可以简单地使用 defaultdict and list comprehension
a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
from collections import defaultdict
d = defaultdict(lambda : 0)
d2 = defaultdict(lambda : 0)
for i in a:
d[i[1]] +=1
for i in a :
d2[i[1]] += i[0]
res = [ [i, d[i], d2[i]] for i in d.keys() ]
输出:
[['CDE', 1, 3], ['DEF', 1, 3], ['BCD', 4, 13], ['ABC', 2, 7]]
编辑:正如@chthonicdaemon 所指出的,初始化 defaultdict 的一种简单方法是传递 int 以将其初始化为 0,如果需要空字符串则传递 str
与pandas:
import pandas
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
df = pandas.DataFrame(data, columns=['count', 'word'])
result = df.groupby('word')['count'].agg((len, sum))
结果:
len sum
word
ABC 2 7
BCD 4 13
CDE 1 3
DEF 1 3
要对结果进行排序,请使用 sort_values
:
result.sort_values(['sum', 'len'])
:
len sum
word
CDE 1 3
DEF 1 3
ABC 2 7
BCD 4 13
这里有一个实用的方法:
l = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
data = itertools.groupby(l, key=lambda x: x[1]))
[(k, len(x), sum(x)) for k, x in map(lambda (x, y): (x, map(lambda x: x[0], list(y))), data)]
[('ABC', 1, 4), ('BCD', 1, 4), ('CDE', 1, 3), ('ABC', 1, 3), ('DEF', 1, 3), ('BCD', 3, 9)]
如果一个键有多个值,请使用 you_dictionary.setdefault(key,[]).append(value)
方法将它们附加到列表中。
a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
my_dict = {}
for item in a:
key,value=item[1],item[0]
my_dict.setdefault(key,[]).append(value)
print(my_dict)
my_list = []
for k,v in my_dict.items():
my_list.append([k,len(v),sum(v)])
print(my_list)
输出:
{'BCD': [4, 3, 3, 3], 'DEF': [3], 'CDE': [3], 'ABC': [4, 3]}
[['BCD', 4, 13], ['DEF', 1, 3], ['CDE', 1, 3], ['ABC', 2, 7]]
我有以下数据
[[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
我需要以下输出
[ABC, 2, 7]
[BCD, 4, 13]
[CDE, 1, 3]
[DEF, 1, 3]
我需要计算位置 [1] 处的单词数,然后计算位置 [0] 处单词的数量。结果是
[Word, freq, sum of weight]
我检查了
我试过了但没有成功
res = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
d = {}
for freq, label in res:
if label not in d:
d[label] = {}
inner_dict = d[label]
if freq not in inner_dict:
inner_dict[freq] = 0
inner_dict[freq] += freq
print(inner_dict)
试试这个:
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
result = {}
for weight, value in data:
if value not in result:
result[value] = [1, weight]
else:
result[value][0] += 1
result[value][1] += weight
print(result)
结果:
{'ABC': [2, 7], 'BCD': [4, 13], 'CDE': [1, 3], 'DEF': [1, 3]}
你可以简单地使用 defaultdict and list comprehension
a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
from collections import defaultdict
d = defaultdict(lambda : 0)
d2 = defaultdict(lambda : 0)
for i in a:
d[i[1]] +=1
for i in a :
d2[i[1]] += i[0]
res = [ [i, d[i], d2[i]] for i in d.keys() ]
输出:
[['CDE', 1, 3], ['DEF', 1, 3], ['BCD', 4, 13], ['ABC', 2, 7]]
编辑:正如@chthonicdaemon 所指出的,初始化 defaultdict 的一种简单方法是传递 int 以将其初始化为 0,如果需要空字符串则传递 str
与pandas:
import pandas
data = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
df = pandas.DataFrame(data, columns=['count', 'word'])
result = df.groupby('word')['count'].agg((len, sum))
结果:
len sum
word
ABC 2 7
BCD 4 13
CDE 1 3
DEF 1 3
要对结果进行排序,请使用 sort_values
:
result.sort_values(['sum', 'len'])
:
len sum
word
CDE 1 3
DEF 1 3
ABC 2 7
BCD 4 13
这里有一个实用的方法:
l = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
data = itertools.groupby(l, key=lambda x: x[1]))
[(k, len(x), sum(x)) for k, x in map(lambda (x, y): (x, map(lambda x: x[0], list(y))), data)]
[('ABC', 1, 4), ('BCD', 1, 4), ('CDE', 1, 3), ('ABC', 1, 3), ('DEF', 1, 3), ('BCD', 3, 9)]
如果一个键有多个值,请使用 you_dictionary.setdefault(key,[]).append(value)
方法将它们附加到列表中。
a = [[4, 'ABC'], [4, 'BCD'], [3, 'CDE'], [3, 'ABC'], [3, 'DEF'], [3, 'BCD'], [3, 'BCD'], [3, 'BCD']]
my_dict = {}
for item in a:
key,value=item[1],item[0]
my_dict.setdefault(key,[]).append(value)
print(my_dict)
my_list = []
for k,v in my_dict.items():
my_list.append([k,len(v),sum(v)])
print(my_list)
输出:
{'BCD': [4, 3, 3, 3], 'DEF': [3], 'CDE': [3], 'ABC': [4, 3]}
[['BCD', 4, 13], ['DEF', 1, 3], ['CDE', 1, 3], ['ABC', 2, 7]]