为用户找到他没有购买的 X 件商品

Find X items for a user that he has not bought

我有一个列表,其中包含许多购买过商品的用户。我想创建负样本,这意味着我想显示用户没有购买哪些商品。 我 select 我数据框中的每个用户,看看他们还没有购买哪些商品。 我已经这样做了,但不幸的是这段代码很慢。 如何查找用户未购买的商品?

d = {'userid': [0, 0, 0, 1, 2, 2, 3, 3, 4, 4, 4],
     'itemid': [715, 845, 98, 12324, 85, 715, 2112, 85, 2112, 852, 102]}
df = pd.DataFrame(data=d)
print(df.head())
    
    # this is the test dataframe
   userid  itemid
0       0     715
1       0     845
2       0      98
3       1   12324
4       2      85

import random
testRatings = test
print(testRatings.head())
test_negative =  [[] for i in range(len(testRatings))]
for index , row in testRatings.iterrows(): 
  for n in range(20): # here I find 20 items that a user has not bought
    ra = random.randrange(item_max)
    IsRating = df.loc[(df['userid']==n)&(df['itemid']==ra)]
    if(IsRating.empty):
        test_negative[index].append(ra)
    # This list contains only 3 values not 20
    # As you can see 3 items that the user has not bought
    #                user 0        user 1
>>> test_megative = [[2112,85,852],[845, 98, 715],...]
    # This is the complete example list
>>> test_megative = [[2112,85,852],[845, 98, 715],[2112, 98, 715],
                     [85, 12324, 102], [715, 12324, 98]]

如果你想把最终的结果放在一个列表中,你可以使用下面的代码。
您首先创建一组您在列 itemid 中拥有的所有可能项目;然后,对于每个 userid,你从 all_items 中减去他购买的所有物品,从而获得他没有购买的物品列表。

all_items = set(df['itemid'])
df.groupby('userid').apply(lambda x: list(all_items.difference(set(x['itemid']))))
# userid
# 0           [2112, 12324, 102, 852, 85]
# 1    [2112, 98, 102, 715, 845, 852, 85]
# 2      [2112, 98, 12324, 102, 845, 852]
# 3       [98, 12324, 102, 715, 845, 852]
# 4             [98, 12324, 715, 845, 85]
# dtype: object

编辑

为了得到 [[], [], ...] 形式的结果并确保每个 userid 最多有 3 个随机未购买物品,请使用以下代码代替

random.seed(123)   # for reproducibility
df.groupby('userid').apply(lambda x: random.sample(list(all_items.difference(set(x['itemid']))), 3)).tolist()
# [[2112, 102, 85],
#  [85, 715, 102],
#  [2112, 852, 102],
#  [845, 852, 102],
#  [715, 98, 845]]