如何将 Mlib 库添加到 Spark？

Question

我被分配到运行一些代码并使用 Apache Spark 使用 Python 语言显示结果，我使用以下步骤安装了 Apache Spark 服务器：https://phoenixnap.com/kb/install-spark-on-windows-10。我尝试了我的代码，一切都很好。现在我被分配了另一个任务，它需要 MLlib 线性回归，他们为我们提供了一些应该运行ning 的代码，然后我们将为它添加额外的代码。当我尝试运行代码时，出现了一些错误和警告，其中一部分出现在之前的作业中，但它仍然有效。我认为问题在于应该添加与 Mlib 库相关的其他内容，以便代码运行正确。任何人都知道应该将哪些文件添加到 spark 中以便它运行是与 MLib 相关的代码？我正在使用 Windows 10 和 spark-3.0.1-bin-hadoop2.7

这是我的代码：

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import StandardScaler

conf = SparkConf().setMaster("local").setAppName("LinearRegression")
sc = SparkContext(conf = conf)
sqlContext = SQLContext(sc)

# Load training data
df = sqlContext.read.format("libsvm").option("numFeatures", 13).load("boston_housing.txt")

# Data needs to be scaled for better results and interpretation
# Initialize the `standardScaler`
standardScaler = StandardScaler(inputCol="features", outputCol="features_scaled")

# Fit the DataFrame to the scaler
scaler = standardScaler.fit(df)

# Transform the data in `df` with the scaler
scaled_df = scaler.transform(df)

# Initialize the linear regression model
lr = LinearRegression(labelCol="label", maxIter=10, regParam=0.3, elasticNetParam=0.8)

# Fit the data to the model
linearModel = lr.fit(scaled_df)

# Print the coefficients for the model
print("Coefficients: %s" % str(linearModel.coefficients))
print("Intercept: %s" % str(linearModel.intercept))

这是我运行代码时的屏幕截图：

Answer 1

尝试pip install numpy（如果失败则pip3 install numpy）。追溯说找不到 numpy 模块。

如何将 Mlib 库添加到 Spark？

How to add Mlib library to Spark?

python

apache-spark

pyspark

apache-spark-mllib