运行 Pyspark 脚本时出现 Zeppelin 错误
Zeppelin Error When Running Pyspark Script
我正在尝试 运行 使用开发端点的 AWS 粘合作业,但 运行 遇到此错误:
org.apache.thrift.transport.TTransportException
这里只指定了我开箱即用的东西:https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html
启动命令:
$sudo bash
$./zeppelin-daemon.sh start
java 版本:openjdk 版本“1.8.0_222”
飞艇版本:版本 0.7.3
MacOS 版本:10.14.6
代码:
%pyspark
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.transforms import *
# Create a Glue context
glueContext = GlueContext(SparkContext.getOrCreate())
错误日志:
INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1569298185348_-1564147927 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990
INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20190924-000945_1425987694 using pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@67a01c6d
ERROR [2019-09-24 11:12:26,154] ({pool-2-thread-2} Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
ERROR [2019-09-24 11:12:26,239] ({pool-2-thread-2} RemoteScheduler.java[getStatus]:281) - Unknown status
java.lang.IllegalArgumentException: No enum constant org.apache.zeppelin.scheduler.Job.Status.UNKNOWN
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.zeppelin.scheduler.Job$Status.valueOf(Job.java:51)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:271)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:342)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2056) - Error
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
WARN [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20190924-000945_1425987694 is finished, status: ERROR, exception: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException, result: org.apache.thrift.transport.TTransportException
INFO [2019-09-24 11:12:26,263] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1569298185348_-1564147927 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990
Spark 解释器配置:
什么没用:
- 正在重新安装 zeppelin
- 正在重新启动飞艇
- 正在安装 apache spark 并将 SPARK_HOME 设置为
/usr/local/Cellar/apache-spark/2.4.4/
- 添加 yarn.. 参数到 spark 解释器配置
事实证明,这是 AWS Glue 团队的问题。这个问题应该得到解决。
我正在尝试 运行 使用开发端点的 AWS 粘合作业,但 运行 遇到此错误:
org.apache.thrift.transport.TTransportException
这里只指定了我开箱即用的东西:https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html
启动命令:
$sudo bash
$./zeppelin-daemon.sh start
java 版本:openjdk 版本“1.8.0_222” 飞艇版本:版本 0.7.3 MacOS 版本:10.14.6
代码:
%pyspark
import sys
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.transforms import *
# Create a Glue context
glueContext = GlueContext(SparkContext.getOrCreate())
错误日志:
INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job paragraph_1569298185348_-1564147927 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990
INFO [2019-09-24 11:12:26,138] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20190924-000945_1425987694 using pyspark org.apache.zeppelin.interpreter.LazyOpenInterpreter@67a01c6d
ERROR [2019-09-24 11:12:26,154] ({pool-2-thread-2} Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
ERROR [2019-09-24 11:12:26,239] ({pool-2-thread-2} RemoteScheduler.java[getStatus]:281) - Unknown status
java.lang.IllegalArgumentException: No enum constant org.apache.zeppelin.scheduler.Job.Status.UNKNOWN
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.zeppelin.scheduler.Job$Status.valueOf(Job.java:51)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:271)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:342)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2056) - Error
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
... 11 more
WARN [2019-09-24 11:12:26,240] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20190924-000945_1425987694 is finished, status: ERROR, exception: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException, result: org.apache.thrift.transport.TTransportException
INFO [2019-09-24 11:12:26,263] ({pool-2-thread-2} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1569298185348_-1564147927 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process458983990
Spark 解释器配置:
什么没用:
- 正在重新安装 zeppelin
- 正在重新启动飞艇
- 正在安装 apache spark 并将 SPARK_HOME 设置为 /usr/local/Cellar/apache-spark/2.4.4/
- 添加 yarn.. 参数到 spark 解释器配置
事实证明,这是 AWS Glue 团队的问题。这个问题应该得到解决。