使用 Pyspark 训练非线性 SVC 模型

Question

有什么方法可以使用 Pyspark 训练非线性 SVC 模型吗？我试过了:

 from sklearn.svm import SVC
 svc = SVC(kernel="rbf", random_state=0, gamma=1, C=1)
 model = svc.fit(features, target)

（features和target是两个dataframes转换成List）。问题是我想用 Pyspark 中的组件进行训练以加快我的训练速度

Answer 1

非线性 SVC 在今天 Pyspark 中（尚）不可用，根据： https://issues.apache.org/jira/browse/SPARK-4638 如果您查看一位 Spark 社区成员的最后评论：

Non-linear kernels for SVMs in Spark would be great to have. The main barriers are: Kernelized SVM training is hard to distribute. Naive methods require a lot of communication. To get this feature into Spark, we'd need to do proper background research and write up a good design. Other ML algorithms are arguably more in demand and still need improvements (as of the date of this comment). Tree ensembles are first-and-foremost in my mind.

使用 Pyspark 训练非线性 SVC 模型

Train a non Linear SVC model using Pyspark

machine-learning

svm

bigdata

pyspark

apache-spark-mllib