如何获得每个客户销售商品频率的列?

How to get the a column for the frequency of items sold per Clients?

如何计算按每个 ClientID 分组的 ItemSold 的频率并为该频率创建新列?

我的数据框如下所示:

data = {'ClientId': ['1','2','3','4','2','2','1','4'],
        'QuantitySold': ['5','10','6','7','5','10','8','7']
       }

预期输出:

Client Id     QuantitySold     FrequencySold
1             5                0.385
2             10               0.4
3             6                1
4             7                0.5
2             5                0.2
2             10               0.4
1             8                0.615
4             7                0.5

我找到了计算的解决方案,但无法创建新列 FrequencySold

工作代码:

import collections
totals = collections.defaultdict(int)
for c, q in zip(data["ClientId"], data["QuantitySold"]):
    totals[c] += int(q)
# defaultdict(int, {'1': 13, '2': 25, '3': 6, '4': 14})

for c, q in zip(data["ClientId"], data["QuantitySold"]):
    print(c, q, int(q)/totals[c])

如何基于循环创建新列?

谢谢

你可以试试:

>>> data = {'ClientId': ['1','2','3','4','2','2','1','4'],
...         'QuantitySold': ['5','10','6','7','5','10','8','7']
...        }
>>> df = pd.DataFrame(data)
>>>
>>> df['FrequencySold'] = df.groupby('ClientId')['QuantitySold'].apply(lambda x: x.astype(int) / x.astype(int).sum())
>>> df
  ClientId QuantitySold  FrequencySold
0        1            5       0.384615
1        2           10       0.400000
2        3            6       1.000000
3        4            7       0.500000
4        2            5       0.200000
5        2           10       0.400000
6        1            8       0.615385
7        4            7       0.500000