协同过滤中的多个特性——spark

Question

我有一个 CSV 文件，如下所示：

customer_ID, location, ....other info..., item-bought, score

我正在尝试在 Spark 中构建协同过滤推荐系统。 Spark 采用以下形式的数据：

userID, itemID, value

但我的数据较长，我希望使用所有用户的信息，而不仅仅是 userID。我尝试将列分组为一列：

(customerID,location,....),itemID,score

但是 ALS.train 给我这个错误：

TypeError: int() argument must be a string or a number, not 'tuple'

如何让 spark 包含多个 key/values 而不是只有三列？谢谢

Answer 1

对于每个客户，确定您想要用来区分这些用户实体的列。创建一个 table（例如在 SQL 中），其中每一行包含一个用户实体的信息，并使用此 table 中的行号作为用户 ID。

如有必要，对您的项目执行相同的操作，并将这些 ID 提供给您的分类器。

multiple features in collaborative filtering- spark