AttributeError: 'MatrixFactorizationModel' object has no attribute 'save'
AttributeError: 'MatrixFactorizationModel' object has no attribute 'save'
我正在尝试 运行 Apache Spark 的 MLlib website 上的示例。下面是我的代码:
import sys
import os
os.environ['SPARK_HOME'] = "/usr/local/Cellar/apache-spark/1.2.1"
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python")
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python/build")
try:
from pyspark import SparkContext, SparkConf
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
print ("Apache-Spark v1.2.1 >>> All modules found and imported successfully.")
except ImportError as e:
print ("Couldn't import Spark Modules", e)
sys.exit(1)
# SETTING CONFIGURATION PARAMETERS
config = (SparkConf()
.setMaster("local")
.setAppName("Music Recommender")
.set("spark.executor.memory", "16G")
.set("spark.driver.memory", "16G")
.set("spark.executor.cores", "8"))
sc = SparkContext(conf=config)
# Load and parse the data
data = sc.textFile("data/1aa")
ratings = data.map(lambda l: l.split('\t')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10
model = ALS.train(ratings, rank, numIterations)
# Evaluate the model on training data
testdata = ratings.map(lambda p: (p[0], p[1]))
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1])**2).mean()
print("Mean Squared Error = " + str(MSE))
# Save and load model
model.save(sc, "/Users/kunal/Developer/MusicRecommender")
sameModel = MatrixFactorizationModel.load(sc, "/Users/kunal/Developer/MusicRecommender/data")
代码是 运行ning 直到打印 MSE。最后一步是将模型保存到目录中。我在下面收到错误 'MatrixFactorizationModel' object has no attribute 'save'
(我粘贴了日志的最后几行):
15/10/06 21:00:16 INFO DAGScheduler: Stage 200 (mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41) finished in 12.875 s
15/10/06 21:00:16 INFO DAGScheduler: Job 8 finished: mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41, took 53.290203 s
Mean Squared Error = 405.148403002
Traceback (most recent call last):
File "/Users/kunal/Developer/MusicRecommender/collabfiltering.py", line 47, in <module>
model.save(sc, path)
AttributeError: 'MatrixFactorizationModel' object has no attribute 'save'
Process finished with exit code 1
我已经重新安装并确保我拥有最新版本的 Spark,但这并没有帮助。
我 运行 只在一个 10MB 的文件上使用它,它是较大文件的一小部分。
操作系统:OSX10.11.1 Beta (15B22c)
发生这种情况是因为您使用的是 Spark 1.2.1 并且 MatrixFactorizationModel.save
方法已在 Spark 1.3.0 中引入。此外,您使用的文档涵盖了当前版本 (1.5.1)。
Spark 文档 URL 如下所示:
http://spark.apache.org/docs/SPARK_VERSION/some_topic.html
所以在你的情况下你应该使用:
http://spark.apache.org/docs/1.2.1/mllib-collaborative-filtering.html
我正在尝试 运行 Apache Spark 的 MLlib website 上的示例。下面是我的代码:
import sys
import os
os.environ['SPARK_HOME'] = "/usr/local/Cellar/apache-spark/1.2.1"
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python")
sys.path.append("/usr/local/Cellar/apache-spark/1.2.1/libexec/python/build")
try:
from pyspark import SparkContext, SparkConf
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
print ("Apache-Spark v1.2.1 >>> All modules found and imported successfully.")
except ImportError as e:
print ("Couldn't import Spark Modules", e)
sys.exit(1)
# SETTING CONFIGURATION PARAMETERS
config = (SparkConf()
.setMaster("local")
.setAppName("Music Recommender")
.set("spark.executor.memory", "16G")
.set("spark.driver.memory", "16G")
.set("spark.executor.cores", "8"))
sc = SparkContext(conf=config)
# Load and parse the data
data = sc.textFile("data/1aa")
ratings = data.map(lambda l: l.split('\t')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10
model = ALS.train(ratings, rank, numIterations)
# Evaluate the model on training data
testdata = ratings.map(lambda p: (p[0], p[1]))
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
ratesAndPreds = ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1])**2).mean()
print("Mean Squared Error = " + str(MSE))
# Save and load model
model.save(sc, "/Users/kunal/Developer/MusicRecommender")
sameModel = MatrixFactorizationModel.load(sc, "/Users/kunal/Developer/MusicRecommender/data")
代码是 运行ning 直到打印 MSE。最后一步是将模型保存到目录中。我在下面收到错误 'MatrixFactorizationModel' object has no attribute 'save'
(我粘贴了日志的最后几行):
15/10/06 21:00:16 INFO DAGScheduler: Stage 200 (mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41) finished in 12.875 s
15/10/06 21:00:16 INFO DAGScheduler: Job 8 finished: mean at /Users/kunal/Developer/MusicRecommender/collabfiltering.py:41, took 53.290203 s
Mean Squared Error = 405.148403002
Traceback (most recent call last):
File "/Users/kunal/Developer/MusicRecommender/collabfiltering.py", line 47, in <module>
model.save(sc, path)
AttributeError: 'MatrixFactorizationModel' object has no attribute 'save'
Process finished with exit code 1
我已经重新安装并确保我拥有最新版本的 Spark,但这并没有帮助。 我 运行 只在一个 10MB 的文件上使用它,它是较大文件的一小部分。
操作系统:OSX10.11.1 Beta (15B22c)
发生这种情况是因为您使用的是 Spark 1.2.1 并且 MatrixFactorizationModel.save
方法已在 Spark 1.3.0 中引入。此外,您使用的文档涵盖了当前版本 (1.5.1)。
Spark 文档 URL 如下所示:
http://spark.apache.org/docs/SPARK_VERSION/some_topic.html
所以在你的情况下你应该使用:
http://spark.apache.org/docs/1.2.1/mllib-collaborative-filtering.html