如何在使用 Counter() 后计算元组内的特定元素 - Python

Question

我想请你帮我澄清一个关于 Python 数据分析的疑问。

我正在开发一个 DataFrame 来显示虚构公司的销售数据。为了确定哪些产品对最常一起销售，我创建了一个名为 'Grouped' 的列，该列将使用相同 'Order ID' 注册的产品分组（每个项目用逗号分隔）。

之后，我使用collections中的Counter()和itertools中的combinations来计算每对在'Grouped'列中出现的次数。请检查以下代码：

from itertools import combinations
from collections import Counter

count_pairs=Counter()

for row in df['Grouped']:
    
    row_list=row.split(',')
    count_pairs.update(Counter(combinations(row_list,2))) # I want to count pairs of two
    
count_pairs.most_common(5)

这是生成的输出：

[(('iPhone', 'Lightning Charging Cable'), 1004),
 (('Google Phone', 'USB-C Charging Cable'), 987),
 (('iPhone', 'Wired Headphones'), 447),
 (('Google Phone', 'Wired Headphones'), 414),
 (('Vareebadd Phone', 'USB-C Charging Cable'), 361)]

现在，我想确定在此列表中出现最多的产品是什么，即每个项目在可用对中出现的次数。

我试着用下面的代码分别分析它们：

['Lightning Charging Cable' in x for x in count_pairs].count(True) #output = 37

但是，我想创建“列表或系列”并根据每个项目在可用对中出现的次数对结果进行排序。

你知道解决这个问题的方法吗？

非常感谢您的帮助！

附加信息：

由于你们中有些人使用前五对（count_pairs.most_common（5））来解决这个问题，我想通知我需要根据 'Counter object'，也就是 'count_pairs':

count_pairs

'count_pairs' 的汇总输出是：

Counter({('Google Phone', 'Wired Headphones'): 414,
         ('Google Phone', 'USB-C Charging Cable'): 987,
         ('Bose SoundSport Headphones', 'Bose SoundSport Headphones'): 27,
         ('AAA Batteries (4-pack)', 'Google Phone'): 11,
         ('Lightning Charging Cable', 'USB-C Charging Cable'): 58,....}) 
#the original output has 313 pairs

Answer 1

我想这可能是你的答案，如果不是请告诉我

import pandas as pd
datas = [(('iPhone', 'Lightning Charging Cable'), 1004),
        (('Google Phone', 'USB-C Charging Cable'), 987),
        (('iPhone', 'Wired Headphones'), 447),
        (('Google Phone', 'Wired Headphones'), 414),
        (('Vareebadd Phone', 'USB-C Charging Cable'), 361)]
count = {}
for data in datas:
    if data[0][1] in list(count):
        count[data[0][1]] = count[data[0][1]][0] + 1
    else:
        count[data[0][1]] = [1]
pd.DataFrame(count).transpose().sort_values(0, ascending = True)

输出：

Answer 2

我会这样做：

from collections import Counter
import pandas as pd

data = [(('iPhone', 'Lightning Charging Cable'), 1004),
 (('Google Phone', 'USB-C Charging Cable'), 987),
 (('iPhone', 'Wired Headphones'), 447),
 (('Google Phone', 'Wired Headphones'), 414),
 (('Vareebadd Phone', 'USB-C Charging Cable'), 361)]

df = pd.DataFrame(sorted(Counter([x[0][1] for x in data]).items(), key= lambda x : x[1]))

要对原始 count_pairs 数据结构应用相同的转换，只需从 x[0][1] 中删除 [0]，如下所示：

df = pd.DataFrame(sorted(Counter([x[1] for x in count_pairs]).items(), key= lambda x : x[1]))

两个结构的有效区别在于data有一个额外的外部list，而count_pairs.

不存在

如何在使用 Counter() 后计算元组内的特定元素 - Python

How to count specific elements inside tuples after using Counter() - Python

python

counter

combinations

data-analysis