如何计算每个客户售出商品的频率?

How to calculate the frequency of items sold per client?

我正在尝试计算为我的数据集的每个客户销售商品的频率,但我不想计算整个数据集长度的频率,而是计算每个客户购买商品的总数。

我的数据框看起来像这样:

data = {'ClientId': ['1','2','3','4','2','2','1','4'],
        'QuantitySold': ['5','10','6','7','5','10','8','7']
       }

预期输出:

Client Id     QuantitySold     FrequencySold
1             5                0.385
2             10               0.4
3             6                1
4             7                0.5
2             5                0.2
2             10               0.4
1             8                0.615
4             7                0.5

计算说明:客户 1 = 5/(5+8)= 0.385

我如何使用 Python 做到这一点?

首先,创建一个包含每个客户总数的字典,然后将当前数量除以这些总数:

import collections
totals = collections.defaultdict(int)
for c, q in zip(data["ClientId"], data["QuantitySold"]):
    totals[c] += int(q)
# defaultdict(int, {'1': 13, '2': 25, '3': 6, '4': 14})

for c, q in zip(data["ClientId"], data["QuantitySold"]):
    print(c, q, int(q)/totals[c])

输出:

1 5 0.38461538461538464
2 10 0.4
3 6 1.0
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.6153846153846154
4 7 0.5

使用pandas,

获取总数并转换为 dict:

summed = df.groupby('ClientId').sum('QuantitySold')
sums = summed.to_dict()['QuantitySold']

计算每一行的频率:

def get_freqs(row):
    return row[1] / sums[row[0]]

应用于每一行:

df['FrequencySold'] = df.apply(get_freqs, axis=1)
import pandas as pd

  # Please be carefull about the data ==>  '5','10' and 5,10 are different (str /int)!

data = {'ClientId': ['1', '2', '3', '4', '2', '2', '1', '4'],
        'QuantitySold': [5, 10, 6, 7, 5, 10, 8, 7] }                # <=== Without ''
        


df = pd.DataFrame.from_dict(data)

df['total sales'] = df['QuantitySold'].groupby(df['ClientId']).transform('sum')

df['frequency'] = df['QuantitySold']/df['total sales']

第一行使用transform方法创建一个新列'total sales',groupby我觉得更容易理解。第二行向现有数据框添加一列 'frequency' 并使用简单的除法运算符计算频率。

输出(已测试,直接从终端粘贴到这里):

   ClientId  QuantitySold  total sales  frequency
0         1             5           13   0.384615
1         2            10           25   0.400000
2         3             6            6   1.000000
3         4             7           14   0.500000
4         2             5           25   0.200000
5         2            10           25   0.400000
6         1             8           13   0.615385
7         4             7           14   0.500000