spark mllib 和 spark ml 的矢量实现有什么区别？

What are the differences between the vector implementations of spark mllib and spark ml?

在较高层次上，我知道 Spark MLLib 是在 RDD 之上编写的，而 Spark ML 是在 DataFrames 之上构建的，但我的理解还不够详细。

特别是，不同向量实现之间缺乏兼容性让我想知道实现上有什么不同，为什么选择这些设计决策？

在 ml 中保留局部线性代数的动机已在 SPARK-13944 中进行了解释。

Separate out linear algebra as a standalone module without Spark dependency to simplify production deployment. We can call the new module mllib-local, which might contain local models in the future. The major issue is to remove dependencies on user-defined types.

The package name will be changed from mllib to ml. For example, Vector will be changed from org.apache.spark.mllib.linalg.Vector to org.apache.spark.ml.linalg.Vector. The return vector type in the new ML pipeline will be the one in ML package; however, the existing mllib code will not be touched. As a result, this will potentially break the API. Also, when the vector is loaded from mllib vector by Spark SQL, the vector will automatically converted into the one in ml package.

现在实现几乎相同，不包括一些转换方法，

spark mllib 和 spark ml 的矢量实现有什么区别？

What are the differences between the vector implementations of spark mllib and spark ml?

apache-spark

apache-spark-ml

apache-spark-mllib