pyspark 内核与 jupyter - 找不到内核
pyspark kernel with jupyter - Cannot find kernel
我正在尝试在 jupyter 中使用 pyspark 内核。我对这两者都是新手,并且四处寻找尝试让 pyspark 2.1.0 在 jupyter 中工作。
我已经在 64 位 Ubuntu 16.04 LTS 上安装了 pyspark 2.1.0 和 anaconda3。
我在 .bashrc 中设置了以下导出:
export SPARK_HOME=/usr/lib/spark
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
export SBT_HOME=/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
PYTHONPATH=/usr/lib/spark/python/lib/py4j-0.10.4-src.zip
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:/home/user1/course/research_methods/spin/Spin/Src6.4.6
export PYSPARK=/usr/lib/spark/bin
export PATH=$PATH:$PYSPARK
export PYSPARK_PYTHON=/home/user1/anaconda3/bin/python3
export PYSPARK_DRIVER_PYTHON=/home/user1/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# added by Anaconda3 4.2.0 installer
export PATH="/home/user1/anaconda3/bin:$PATH"
export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH
我在 ~/.jupyter/profile_spark/
中创建了文件“00-pyspark-setup.py”
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 2.1.0" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
中推荐
当我 运行 脚本时,它产生以下输出:
$ ./00-pyspark-setup.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 3.5.2 (default, Jul 2 2016 17:53:06)
SparkSession available as 'spark'.
$
当我在 jupyter 中打开一个包含以下元数据的 .ipynb 文件时:
"metadata": {
"kernelspec": {
"display_name": "PySpark",
"language": "python",
"name": "pyspark"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"version": "3.5.2"
}
我收到以下错误:“我找不到与 PySpark 匹配的内核。请 select 一个内核:”
错误信息旁边的一个"kernel"下拉列表只有下面两个选项"Python [conda root]"和"Python [default]"。没有 pyspark 选项。
任何人都可以建议我需要修改什么才能使 pyspark 可用吗?
谢谢
.bashrc -> py4j-0.10.4-src.zip
和
00-pyspark-setup.py -> py4j-0.8.2.1-src.zip
使用不同的 py4j
我正在尝试在 jupyter 中使用 pyspark 内核。我对这两者都是新手,并且四处寻找尝试让 pyspark 2.1.0 在 jupyter 中工作。
我已经在 64 位 Ubuntu 16.04 LTS 上安装了 pyspark 2.1.0 和 anaconda3。 我在 .bashrc 中设置了以下导出:
export SPARK_HOME=/usr/lib/spark
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
export SBT_HOME=/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
PYTHONPATH=/usr/lib/spark/python/lib/py4j-0.10.4-src.zip
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:/home/user1/course/research_methods/spin/Spin/Src6.4.6
export PYSPARK=/usr/lib/spark/bin
export PATH=$PATH:$PYSPARK
export PYSPARK_PYTHON=/home/user1/anaconda3/bin/python3
export PYSPARK_DRIVER_PYTHON=/home/user1/anaconda3/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# added by Anaconda3 4.2.0 installer
export PATH="/home/user1/anaconda3/bin:$PATH"
export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH
我在 ~/.jupyter/profile_spark/
中创建了文件“00-pyspark-setup.py”import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 2.1.0" in open(spark_release_file).read():
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "")
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
中推荐
当我 运行 脚本时,它产生以下输出:
$ ./00-pyspark-setup.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 3.5.2 (default, Jul 2 2016 17:53:06)
SparkSession available as 'spark'.
$
当我在 jupyter 中打开一个包含以下元数据的 .ipynb 文件时:
"metadata": {
"kernelspec": {
"display_name": "PySpark",
"language": "python",
"name": "pyspark"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"version": "3.5.2"
}
我收到以下错误:“我找不到与 PySpark 匹配的内核。请 select 一个内核:” 错误信息旁边的一个"kernel"下拉列表只有下面两个选项"Python [conda root]"和"Python [default]"。没有 pyspark 选项。
任何人都可以建议我需要修改什么才能使 pyspark 可用吗?
谢谢
.bashrc -> py4j-0.10.4-src.zip
和
00-pyspark-setup.py -> py4j-0.8.2.1-src.zip
使用不同的 py4j