如何计算每个客户售出商品的频率?
How to calculate the frequency of items sold per client?
我正在尝试计算为我的数据集的每个客户销售商品的频率,但我不想计算整个数据集长度的频率,而是计算每个客户购买商品的总数。
我的数据框看起来像这样:
data = {'ClientId': ['1','2','3','4','2','2','1','4'],
'QuantitySold': ['5','10','6','7','5','10','8','7']
}
预期输出:
Client Id QuantitySold FrequencySold
1 5 0.385
2 10 0.4
3 6 1
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.615
4 7 0.5
计算说明:客户 1 = 5/(5+8)= 0.385
我如何使用 Python 做到这一点?
首先,创建一个包含每个客户总数的字典,然后将当前数量除以这些总数:
import collections
totals = collections.defaultdict(int)
for c, q in zip(data["ClientId"], data["QuantitySold"]):
totals[c] += int(q)
# defaultdict(int, {'1': 13, '2': 25, '3': 6, '4': 14})
for c, q in zip(data["ClientId"], data["QuantitySold"]):
print(c, q, int(q)/totals[c])
输出:
1 5 0.38461538461538464
2 10 0.4
3 6 1.0
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.6153846153846154
4 7 0.5
使用pandas
,
获取总数并转换为 dict
:
summed = df.groupby('ClientId').sum('QuantitySold')
sums = summed.to_dict()['QuantitySold']
计算每一行的频率:
def get_freqs(row):
return row[1] / sums[row[0]]
应用于每一行:
df['FrequencySold'] = df.apply(get_freqs, axis=1)
import pandas as pd
# Please be carefull about the data ==> '5','10' and 5,10 are different (str /int)!
data = {'ClientId': ['1', '2', '3', '4', '2', '2', '1', '4'],
'QuantitySold': [5, 10, 6, 7, 5, 10, 8, 7] } # <=== Without ''
df = pd.DataFrame.from_dict(data)
df['total sales'] = df['QuantitySold'].groupby(df['ClientId']).transform('sum')
df['frequency'] = df['QuantitySold']/df['total sales']
第一行使用transform
方法创建一个新列'total sales',groupby
我觉得更容易理解。第二行向现有数据框添加一列 'frequency' 并使用简单的除法运算符计算频率。
输出(已测试,直接从终端粘贴到这里):
ClientId QuantitySold total sales frequency
0 1 5 13 0.384615
1 2 10 25 0.400000
2 3 6 6 1.000000
3 4 7 14 0.500000
4 2 5 25 0.200000
5 2 10 25 0.400000
6 1 8 13 0.615385
7 4 7 14 0.500000
我正在尝试计算为我的数据集的每个客户销售商品的频率,但我不想计算整个数据集长度的频率,而是计算每个客户购买商品的总数。
我的数据框看起来像这样:
data = {'ClientId': ['1','2','3','4','2','2','1','4'],
'QuantitySold': ['5','10','6','7','5','10','8','7']
}
预期输出:
Client Id QuantitySold FrequencySold
1 5 0.385
2 10 0.4
3 6 1
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.615
4 7 0.5
计算说明:客户 1 = 5/(5+8)= 0.385
我如何使用 Python 做到这一点?
首先,创建一个包含每个客户总数的字典,然后将当前数量除以这些总数:
import collections
totals = collections.defaultdict(int)
for c, q in zip(data["ClientId"], data["QuantitySold"]):
totals[c] += int(q)
# defaultdict(int, {'1': 13, '2': 25, '3': 6, '4': 14})
for c, q in zip(data["ClientId"], data["QuantitySold"]):
print(c, q, int(q)/totals[c])
输出:
1 5 0.38461538461538464
2 10 0.4
3 6 1.0
4 7 0.5
2 5 0.2
2 10 0.4
1 8 0.6153846153846154
4 7 0.5
使用pandas
,
获取总数并转换为 dict
:
summed = df.groupby('ClientId').sum('QuantitySold')
sums = summed.to_dict()['QuantitySold']
计算每一行的频率:
def get_freqs(row):
return row[1] / sums[row[0]]
应用于每一行:
df['FrequencySold'] = df.apply(get_freqs, axis=1)
import pandas as pd
# Please be carefull about the data ==> '5','10' and 5,10 are different (str /int)!
data = {'ClientId': ['1', '2', '3', '4', '2', '2', '1', '4'],
'QuantitySold': [5, 10, 6, 7, 5, 10, 8, 7] } # <=== Without ''
df = pd.DataFrame.from_dict(data)
df['total sales'] = df['QuantitySold'].groupby(df['ClientId']).transform('sum')
df['frequency'] = df['QuantitySold']/df['total sales']
第一行使用transform
方法创建一个新列'total sales',groupby
我觉得更容易理解。第二行向现有数据框添加一列 'frequency' 并使用简单的除法运算符计算频率。
输出(已测试,直接从终端粘贴到这里):
ClientId QuantitySold total sales frequency
0 1 5 13 0.384615
1 2 10 25 0.400000
2 3 6 6 1.000000
3 4 7 14 0.500000
4 2 5 25 0.200000
5 2 10 25 0.400000
6 1 8 13 0.615385
7 4 7 14 0.500000