“libclntsh.so:无法在 Spark 集群中打开 ubuntu 到 运行 python 程序中的共享对象文件
"libclntsh.so: cannot open shared object file in ubuntu to run python program in Spark Cluster
我的 Python 程序在本地运行没有任何问题。但是当我想在 Spark 集群中 运行 它时,我收到关于 libclntsh.so 的错误,该集群有两个节点。
为了解释更多,对于运行集群中的程序,首先我在spark-env.sh中设置了Master IP Address 像这样:
export SPARK_MASTER_HOST=x.x.x.x
然后将slave节点的IP写入$SPARK_HOME/conf/workers即可。
在那之后,首先我 运行 掌握了这一行:
/opt/spark/sbin/start-master.sh
然后 运行 奴隶:
/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077
接下来我检查 SPARK UI 是否启动。所以,我 运行 Master Node 中的程序是这样的:
/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py
当上面的命令是运行时,我收到这个错误:
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
for k, v in iterator:
File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
df_group = connection.read_sql(query_cnt)
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
self.connect()
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
self.conn = cx_Oracle.connect(db_url)
cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library:
"libclntsh.so: cannot open shared object file: No such file or directory".
我在 ~/.bashrc:
中设置了这个环境变量
export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME:$PATH
export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:$JAVA_HOME/bin
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8
你能指导我哪里出了问题吗?
如有任何帮助,我们将不胜感激。
问题已解决。根据TroubleShootinglink,首先我在/etc/ld.[=32=中创建了一个文件InstantClient.conf ]/ PATH 并将Instant Client目录的路径写入其中。
# instant client Path
/usr/share/oracle/instantclient_19_8
最后,我 运行 这个命令:
sudo ldconfig
然后我 运行 spark-submit 并且它在 InstantClient.[=13= 上没有错误地工作]
希望对其他人有所帮助。
我的 Python 程序在本地运行没有任何问题。但是当我想在 Spark 集群中 运行 它时,我收到关于 libclntsh.so 的错误,该集群有两个节点。
为了解释更多,对于运行集群中的程序,首先我在spark-env.sh中设置了Master IP Address 像这样:
export SPARK_MASTER_HOST=x.x.x.x
然后将slave节点的IP写入$SPARK_HOME/conf/workers即可。 在那之后,首先我 运行 掌握了这一行:
/opt/spark/sbin/start-master.sh
然后 运行 奴隶:
/opt/spark/sbin/start-worker.sh spark://x.x.x.x:7077
接下来我检查 SPARK UI 是否启动。所以,我 运行 Master Node 中的程序是这样的:
/opt/spark/bin/spark-submit --master spark://x.x.x.x:7077 --files sparkConfig.json --py-files cst_utils.py,grouping.py,group_state.py,g_utils.py,csts.py,oracle_connection.py,config.py,brn_utils.py,emp_utils.py main.py
当上面的命令是运行时,我收到这个错误:
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 604, in main
process()
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 594, in process
out_iter = func(split_index, iterator)
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2916, in pipeline_func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 418, in func
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2144, in combineLocally
File "/opt/spark/python/lib/pyspark.zip/pyspark/shuffle.py", line 240, in mergeValues
for k, v in iterator:
File "/opt/spark/python/lib/pyspark.zip/pyspark/util.py", line 73, in wrapper
return f(*args, **kwargs)
File "/opt/spark/work/app-20220221165611-0005/0/customer_utils.py", line 340, in read_cst
df_group = connection.read_sql(query_cnt)
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 109, in read_sql
self.connect()
File "/opt/spark/work/app-20220221165611-0005/0/oracle_connection.py", line 40, in connect
self.conn = cx_Oracle.connect(db_url)
cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library:
"libclntsh.so: cannot open shared object file: No such file or directory".
我在 ~/.bashrc:
中设置了这个环境变量 export ORACLE_HOME=/usr/share/oracle/instantclient_19_8
export LD_LIBRARY_PATH=$ORACLE_HOME:$LD_LIBRARY_PATH
export PATH=$ORACLE_HOME:$PATH
export JAVA_HOME=/usr/lib/jvm/java/jdk1.8.0_271
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PATH=$PATH:$JAVA_HOME/bin
export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_HOME=/usr/bin/python3.8
export PYSPARK_DRIVER_PYTHON=python3.8
你能指导我哪里出了问题吗?
如有任何帮助,我们将不胜感激。
问题已解决。根据TroubleShootinglink,首先我在/etc/ld.[=32=中创建了一个文件InstantClient.conf ]/ PATH 并将Instant Client目录的路径写入其中。
# instant client Path
/usr/share/oracle/instantclient_19_8
最后,我 运行 这个命令:
sudo ldconfig
然后我 运行 spark-submit 并且它在 InstantClient.[=13= 上没有错误地工作]
希望对其他人有所帮助。