为用户找到他没有购买的 X 件商品
Find X items for a user that he has not bought
我有一个列表,其中包含许多购买过商品的用户。我想创建负样本,这意味着我想显示用户没有购买哪些商品。
我 select 我数据框中的每个用户,看看他们还没有购买哪些商品。
我已经这样做了,但不幸的是这段代码很慢。
如何查找用户未购买的商品?
d = {'userid': [0, 0, 0, 1, 2, 2, 3, 3, 4, 4, 4],
'itemid': [715, 845, 98, 12324, 85, 715, 2112, 85, 2112, 852, 102]}
df = pd.DataFrame(data=d)
print(df.head())
# this is the test dataframe
userid itemid
0 0 715
1 0 845
2 0 98
3 1 12324
4 2 85
import random
testRatings = test
print(testRatings.head())
test_negative = [[] for i in range(len(testRatings))]
for index , row in testRatings.iterrows():
for n in range(20): # here I find 20 items that a user has not bought
ra = random.randrange(item_max)
IsRating = df.loc[(df['userid']==n)&(df['itemid']==ra)]
if(IsRating.empty):
test_negative[index].append(ra)
# This list contains only 3 values not 20
# As you can see 3 items that the user has not bought
# user 0 user 1
>>> test_megative = [[2112,85,852],[845, 98, 715],...]
# This is the complete example list
>>> test_megative = [[2112,85,852],[845, 98, 715],[2112, 98, 715],
[85, 12324, 102], [715, 12324, 98]]
如果你想把最终的结果放在一个列表中,你可以使用下面的代码。
您首先创建一组您在列 itemid
中拥有的所有可能项目;然后,对于每个 userid
,你从 all_items
中减去他购买的所有物品,从而获得他没有购买的物品列表。
all_items = set(df['itemid'])
df.groupby('userid').apply(lambda x: list(all_items.difference(set(x['itemid']))))
# userid
# 0 [2112, 12324, 102, 852, 85]
# 1 [2112, 98, 102, 715, 845, 852, 85]
# 2 [2112, 98, 12324, 102, 845, 852]
# 3 [98, 12324, 102, 715, 845, 852]
# 4 [98, 12324, 715, 845, 85]
# dtype: object
编辑
为了得到 [[], [], ...]
形式的结果并确保每个 userid
最多有 3 个随机未购买物品,请使用以下代码代替
random.seed(123) # for reproducibility
df.groupby('userid').apply(lambda x: random.sample(list(all_items.difference(set(x['itemid']))), 3)).tolist()
# [[2112, 102, 85],
# [85, 715, 102],
# [2112, 852, 102],
# [845, 852, 102],
# [715, 98, 845]]
我有一个列表,其中包含许多购买过商品的用户。我想创建负样本,这意味着我想显示用户没有购买哪些商品。 我 select 我数据框中的每个用户,看看他们还没有购买哪些商品。 我已经这样做了,但不幸的是这段代码很慢。 如何查找用户未购买的商品?
d = {'userid': [0, 0, 0, 1, 2, 2, 3, 3, 4, 4, 4],
'itemid': [715, 845, 98, 12324, 85, 715, 2112, 85, 2112, 852, 102]}
df = pd.DataFrame(data=d)
print(df.head())
# this is the test dataframe
userid itemid
0 0 715
1 0 845
2 0 98
3 1 12324
4 2 85
import random
testRatings = test
print(testRatings.head())
test_negative = [[] for i in range(len(testRatings))]
for index , row in testRatings.iterrows():
for n in range(20): # here I find 20 items that a user has not bought
ra = random.randrange(item_max)
IsRating = df.loc[(df['userid']==n)&(df['itemid']==ra)]
if(IsRating.empty):
test_negative[index].append(ra)
# This list contains only 3 values not 20
# As you can see 3 items that the user has not bought
# user 0 user 1
>>> test_megative = [[2112,85,852],[845, 98, 715],...]
# This is the complete example list
>>> test_megative = [[2112,85,852],[845, 98, 715],[2112, 98, 715],
[85, 12324, 102], [715, 12324, 98]]
如果你想把最终的结果放在一个列表中,你可以使用下面的代码。
您首先创建一组您在列 itemid
中拥有的所有可能项目;然后,对于每个 userid
,你从 all_items
中减去他购买的所有物品,从而获得他没有购买的物品列表。
all_items = set(df['itemid'])
df.groupby('userid').apply(lambda x: list(all_items.difference(set(x['itemid']))))
# userid
# 0 [2112, 12324, 102, 852, 85]
# 1 [2112, 98, 102, 715, 845, 852, 85]
# 2 [2112, 98, 12324, 102, 845, 852]
# 3 [98, 12324, 102, 715, 845, 852]
# 4 [98, 12324, 715, 845, 85]
# dtype: object
编辑
为了得到 [[], [], ...]
形式的结果并确保每个 userid
最多有 3 个随机未购买物品,请使用以下代码代替
random.seed(123) # for reproducibility
df.groupby('userid').apply(lambda x: random.sample(list(all_items.difference(set(x['itemid']))), 3)).tolist()
# [[2112, 102, 85],
# [85, 715, 102],
# [2112, 852, 102],
# [845, 852, 102],
# [715, 98, 845]]