使用 Pyspark 训练非线性 SVC 模型
Train a non Linear SVC model using Pyspark
有什么方法可以使用 Pyspark 训练非线性 SVC 模型吗?
我试过了:
from sklearn.svm import SVC
svc = SVC(kernel="rbf", random_state=0, gamma=1, C=1)
model = svc.fit(features, target)
(features和target是两个dataframes转换成List)。
问题是我想用 Pyspark 中的组件进行训练以加快我的训练速度
非线性 SVC 在今天 Pyspark 中(尚)不可用,根据:
https://issues.apache.org/jira/browse/SPARK-4638 如果您查看一位 Spark 社区成员的最后评论:
Non-linear kernels for SVMs in Spark would be great to have. The main barriers are:
Kernelized SVM training is hard to distribute. Naive methods require a lot of communication. To get this feature into Spark, we'd need to do proper background research and write up a good design.
Other ML algorithms are arguably more in demand and still need improvements (as of the date of this comment). Tree ensembles are first-and-foremost in my mind.
有什么方法可以使用 Pyspark 训练非线性 SVC 模型吗? 我试过了:
from sklearn.svm import SVC
svc = SVC(kernel="rbf", random_state=0, gamma=1, C=1)
model = svc.fit(features, target)
(features和target是两个dataframes转换成List)。 问题是我想用 Pyspark 中的组件进行训练以加快我的训练速度
非线性 SVC 在今天 Pyspark 中(尚)不可用,根据: https://issues.apache.org/jira/browse/SPARK-4638 如果您查看一位 Spark 社区成员的最后评论:
Non-linear kernels for SVMs in Spark would be great to have. The main barriers are: Kernelized SVM training is hard to distribute. Naive methods require a lot of communication. To get this feature into Spark, we'd need to do proper background research and write up a good design. Other ML algorithms are arguably more in demand and still need improvements (as of the date of this comment). Tree ensembles are first-and-foremost in my mind.