流式 Kmeans Spark JAVA

Streaming Kmeans Spark JAVA

嗨基本上我们想在我们的论文中使用 KAFKA+SPARK Streaming 来捕获 Twitter 垃圾邮件。我想使用 streamingKmeans。但我有一个非常新手和严肃的问题：

model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

为什么我需要通过 "LABEL" 功能？我的意思是，我是否误解了整个想法？我们不是要预测 "label" 吗？我如何预测我的推文是否是垃圾邮件？

对于预测，仅使用 lp.features，而 lp.label 被视为保留的密钥。引用自 docs:

Use the model to make predictions on the values of a DStream and carry over its keys.

我想在您的示例中，您只想将 predictOnValues 替换为 predictOn