ImportError: cannot import name sqlContext
ImportError: cannot import name sqlContext
我正在使用 pyspark 读取一些 csv 数据以激发 Dataframe。
我尝试按如下方式导入 pyspark 模块:
from pyspark.sql import sqlContext
为什么会出现以下错误?如何解决?
ImportError: cannot import name sqlContext
我正在使用 Python 2.7 和 Spark 2.0.1
这可能是因为您的 python pat 设置不正确。我发现以下功能在配置我的 python 环境时很有用。
def configure_spark(spark_home=None, pyspark_python=None, conf_dir=None):
"""Configures the Python path for importing pyspark
Sets the SPARK_HOME and PYSPARK_PYTHON environment variables and modifies
the Python PATH so the pyspark package can be imported.
Args:
spark_home (str): Path of SPARK_HOME. Defaults to SPARK_HOME module
variable.
pyspark_python (str): Path to Python binary to use in PySpark. Defaults
to the currently executing Python binary.
conf_dir (str): Path to configuration directory
"""
# Set the configuration directory with some basic sanity checks:
if conf_dir:
if not os.path.isdir(conf_dir):
raise OSError("Spark config directory not found: %s" % conf_dir)
expected_conf = {'spark-env.sh', 'spark-defaults.conf'}
found_conf = expected_conf - set(os.listdir(conf_dir))
if found_conf:
warnings.warn("Some configuration files were not found: %s" % found_conf)
os.environ['SPARK_CONF_DIR'] = conf_dir
spark_home = spark_home or SPARK_HOME
os.environ['SPARK_HOME'] = spark_home
if not os.path.isdir(spark_home):
raise OSError("Specified SPARK_HOME is not a valid directory: %s" % spark_home)
# Add the PySpark directories to the Python path:
libs = glob(os.path.join(spark_home, 'python', 'lib', '*.zip'))
if len(libs) < 2:
raise OSError("Pyspark libraries not found in %s" % spark_home)
for lib in libs:
sys.path.insert(1, lib)
# If PYSPARK_PYTHON isn't specified, use currently running Python binary:
pyspark_python = pyspark_python or sys.executable
os.environ['PYSPARK_PYTHON'] = pyspark_python
我正在使用 pyspark 读取一些 csv 数据以激发 Dataframe。
我尝试按如下方式导入 pyspark 模块:
from pyspark.sql import sqlContext
为什么会出现以下错误?如何解决?
ImportError: cannot import name sqlContext
我正在使用 Python 2.7 和 Spark 2.0.1
这可能是因为您的 python pat 设置不正确。我发现以下功能在配置我的 python 环境时很有用。
def configure_spark(spark_home=None, pyspark_python=None, conf_dir=None):
"""Configures the Python path for importing pyspark
Sets the SPARK_HOME and PYSPARK_PYTHON environment variables and modifies
the Python PATH so the pyspark package can be imported.
Args:
spark_home (str): Path of SPARK_HOME. Defaults to SPARK_HOME module
variable.
pyspark_python (str): Path to Python binary to use in PySpark. Defaults
to the currently executing Python binary.
conf_dir (str): Path to configuration directory
"""
# Set the configuration directory with some basic sanity checks:
if conf_dir:
if not os.path.isdir(conf_dir):
raise OSError("Spark config directory not found: %s" % conf_dir)
expected_conf = {'spark-env.sh', 'spark-defaults.conf'}
found_conf = expected_conf - set(os.listdir(conf_dir))
if found_conf:
warnings.warn("Some configuration files were not found: %s" % found_conf)
os.environ['SPARK_CONF_DIR'] = conf_dir
spark_home = spark_home or SPARK_HOME
os.environ['SPARK_HOME'] = spark_home
if not os.path.isdir(spark_home):
raise OSError("Specified SPARK_HOME is not a valid directory: %s" % spark_home)
# Add the PySpark directories to the Python path:
libs = glob(os.path.join(spark_home, 'python', 'lib', '*.zip'))
if len(libs) < 2:
raise OSError("Pyspark libraries not found in %s" % spark_home)
for lib in libs:
sys.path.insert(1, lib)
# If PYSPARK_PYTHON isn't specified, use currently running Python binary:
pyspark_python = pyspark_python or sys.executable
os.environ['PYSPARK_PYTHON'] = pyspark_python