MatrixFactorizationModel 的正确 save/load
Proper save/load of MatrixFactorizationModel
我有 MatrixFactorizationModel 对象。如果我试图在通过 ALS.train(...) 构建模型后立即向单个用户推荐产品,那么它需要 300 毫秒(对于我的数据和硬件)。但是,如果我将模型保存到磁盘并将其加载回来,那么推荐将花费将近 2000 毫秒。 Spark 还警告:
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor is not cached. Prediction could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor is not cached. Prediction could be slow.
如何在加载模型后 create/set 分区器和缓存用户和产品因素?以下方法没有帮助:
model.userFeatures().cache();
model.productFeatures().cache();
我也试图重新分区这些 rdds 并从重新分区的版本创建新模型,但这也没有帮助。
你不必使用括号,userFeatures 是 (Int, Array[Double]) 的 RDD,它不带参数。
这将帮助您:
model.userFeatures.cache
model.productFeatures.cache
我有 MatrixFactorizationModel 对象。如果我试图在通过 ALS.train(...) 构建模型后立即向单个用户推荐产品,那么它需要 300 毫秒(对于我的数据和硬件)。但是,如果我将模型保存到磁盘并将其加载回来,那么推荐将花费将近 2000 毫秒。 Spark 还警告:
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor is not cached. Prediction could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor is not cached. Prediction could be slow.
如何在加载模型后 create/set 分区器和缓存用户和产品因素?以下方法没有帮助:
model.userFeatures().cache();
model.productFeatures().cache();
我也试图重新分区这些 rdds 并从重新分区的版本创建新模型,但这也没有帮助。
你不必使用括号,userFeatures 是 (Int, Array[Double]) 的 RDD,它不带参数。
这将帮助您:
model.userFeatures.cache
model.productFeatures.cache