我正在尝试在 spark 中 运行 Kmeans 聚类,为此我遇到了异常
I am trying to run Kmeans clustering in spark , for which I am getting an exception
**Python Version = 2.6.6**
**numpy version = 1.3.0**
**** The python file dokmeans.py is located in /home/cloudera****
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.3.0
/_/
Using Python version 2.6.6 (r266:84292, Feb 22 2013 00:00:18)
SparkContext available as sc, HiveContext available as sqlCtx.
>>> exec(open('dokmeans.py').read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 5, in <module>
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 26, in <module>
raise Exception("MLlib requires NumPy 1.4+")
Exception: MLlib requires NumPy 1.4+
>>> from pyspark.mllib.clustering import KMeans,KMeansModel
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 26, in <module>
raise Exception("MLlib requires NumPy 1.4+")
Exception: MLlib requires NumPy 1.4+
好吧,错误代码说明了一切。为了使用 MLlib,您需要安装 numpy 1.4。你已经安装了 1.3。
mllib 代码中存在错误,无法正确解释 numpy 版本。它将 1.10 解释为 1.1,因此对 numpy 版本的检查失败。
请更改以下文件中的代码
/usr/lib/spark/python/pyspark/mllib/init.py
来自:
**if numpy.version.version < '1.4':**
raise Exception("MLlib requires NumPy 1.4+")
至:
ver = [int(x) for x in numpy.version.version.split('.')[:2]]
if ver < [1, 4]:
raise Exception("MLlib requires NumPy 1.4+")
**Python Version = 2.6.6**
**numpy version = 1.3.0**
**** The python file dokmeans.py is located in /home/cloudera****
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.3.0
/_/
Using Python version 2.6.6 (r266:84292, Feb 22 2013 00:00:18)
SparkContext available as sc, HiveContext available as sqlCtx.
>>> exec(open('dokmeans.py').read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 5, in <module>
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 26, in <module>
raise Exception("MLlib requires NumPy 1.4+")
Exception: MLlib requires NumPy 1.4+
>>> from pyspark.mllib.clustering import KMeans,KMeansModel
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/mllib/__init__.py", line 26, in <module>
raise Exception("MLlib requires NumPy 1.4+")
Exception: MLlib requires NumPy 1.4+
好吧,错误代码说明了一切。为了使用 MLlib,您需要安装 numpy 1.4。你已经安装了 1.3。
mllib 代码中存在错误,无法正确解释 numpy 版本。它将 1.10 解释为 1.1,因此对 numpy 版本的检查失败。
请更改以下文件中的代码 /usr/lib/spark/python/pyspark/mllib/init.py
来自:
**if numpy.version.version < '1.4':**
raise Exception("MLlib requires NumPy 1.4+")
至:
ver = [int(x) for x in numpy.version.version.split('.')[:2]]
if ver < [1, 4]:
raise Exception("MLlib requires NumPy 1.4+")