Spark YARN 客户端 Windows 7 期
Spark YARN client on Windows 7 issue
我正在尝试执行
spark-submit --master yarn-client
在 windows 7 CDH 5.4.5 客户端计算机上。簇。
我下载了 spark 1.5。来自 spark.apache.org 的组装。
然后从集群上的cloudera manager 运行下载yarn-config,并将其路径写入客户端的env变量YARN_CONF。
Yarn 应用程序正常工作,但客户端出现异常
15/10/16 10:54:59 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.20.52.104
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "C:\workspace\development\"): CreateProcess error=2, ═х єфрхЄё эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:270)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:262)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:262)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive.applyOrElse(CoarseGrainedSchedulerBackend.scala:106)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:178)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging$$anonfun$applyOrElse.apply$mcV$sp(AkkaR
pcEnv.scala:127)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:198)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging.applyOrElse(AkkaRpcEnv.scala:126)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon.aroundReceive(AkkaRpcEnv.scala:93)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: CreateProcess error=2, ═х єфрхЄё эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 38 more
然后我修改了 yarn site.xml param "net.topology.script.file.name" 的客户端配置以更正路径,现在客户端得到异常
15/10/16 10:48:57 WARN net.ScriptBasedMapping: Exception running C:\packages\hadoop-client\yarn-conf\topology.py 10.20.52.105
java.io.IOException: Cannot run program "C:\packages\hadoop-client\yarn-conf\topology.py" (in directory "C:\workspace\development\"): CreateProcess error=193, %1 эх ты хЄё яЁшыюцхэшхь Win32
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask.apply(TaskSetManager.scala:213)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask.apply(TaskSetManager.scala:192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:192)
at org.apache.spark.scheduler.TaskSetManager$$anonfun.apply$mcVI$sp(TaskSetManager.scala:161)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
据我了解,spark 无法在 windows 上使用 python.exe 正确调用 topology.py 脚本,但如何解决?
只是评论 "net.topology.script.file.name" yarn param in site.xml.
我在尝试使用带有 Spark 的 iPython 笔记本访问 HortonWorks HDP2.4 时遇到了与上述问题完全相同的问题。我用上面@mikhail-kramer 的建议解决了它。
在 Windows 客户端上,我不得不注释掉我使用 Ambari 下载的文件 core-site.xml 中 net.topology.script.name 变量的值。注释掉的值现在看起来像这样:
<property>
<name>net.topology.script.file.name</name>
<value><!--/etc/hadoop/conf/topology_script.py--></value>
</property>
希望这对以后遇到同样问题的下一个人有所帮助。
我正在尝试执行
spark-submit --master yarn-client
在 windows 7 CDH 5.4.5 客户端计算机上。簇。 我下载了 spark 1.5。来自 spark.apache.org 的组装。 然后从集群上的cloudera manager 运行下载yarn-config,并将其路径写入客户端的env变量YARN_CONF。
Yarn 应用程序正常工作,但客户端出现异常
15/10/16 10:54:59 WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.20.52.104
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn/topology.py" (in directory "C:\workspace\development\"): CreateProcess error=2, ═х єфрхЄё эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:270)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers.apply(TaskSchedulerImpl.scala:262)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:262)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.makeOffers(CoarseGrainedSchedulerBackend.scala:167)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive.applyOrElse(CoarseGrainedSchedulerBackend.scala:106)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:178)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging$$anonfun$applyOrElse.apply$mcV$sp(AkkaR
pcEnv.scala:127)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:198)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon$$anonfun$receiveWithLogging.applyOrElse(AkkaRpcEnv.scala:126)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$$anon.aroundReceive(AkkaRpcEnv.scala:93)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: CreateProcess error=2, ═х єфрхЄё эрщЄш єърчрээ√щ Їрщы
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 38 more
然后我修改了 yarn site.xml param "net.topology.script.file.name" 的客户端配置以更正路径,现在客户端得到异常
15/10/16 10:48:57 WARN net.ScriptBasedMapping: Exception running C:\packages\hadoop-client\yarn-conf\topology.py 10.20.52.105
java.io.IOException: Cannot run program "C:\packages\hadoop-client\yarn-conf\topology.py" (in directory "C:\workspace\development\"): CreateProcess error=193, %1 эх ты хЄё яЁшыюцхэшхь Win32
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:81)
at org.apache.spark.scheduler.cluster.YarnScheduler.getRackForHost(YarnScheduler.scala:38)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask.apply(TaskSetManager.scala:213)
at org.apache.spark.scheduler.TaskSetManager$$anonfun$org$apache$spark$scheduler$TaskSetManager$$addPendingTask.apply(TaskSetManager.scala:192)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.TaskSetManager.org$apache$spark$scheduler$TaskSetManager$$addPendingTask(TaskSetManager.scala:192)
at org.apache.spark.scheduler.TaskSetManager$$anonfun.apply$mcVI$sp(TaskSetManager.scala:161)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
据我了解,spark 无法在 windows 上使用 python.exe 正确调用 topology.py 脚本,但如何解决?
只是评论 "net.topology.script.file.name" yarn param in site.xml.
我在尝试使用带有 Spark 的 iPython 笔记本访问 HortonWorks HDP2.4 时遇到了与上述问题完全相同的问题。我用上面@mikhail-kramer 的建议解决了它。
在 Windows 客户端上,我不得不注释掉我使用 Ambari 下载的文件 core-site.xml 中 net.topology.script.name 变量的值。注释掉的值现在看起来像这样:
<property>
<name>net.topology.script.file.name</name>
<value><!--/etc/hadoop/conf/topology_script.py--></value>
</property>
希望这对以后遇到同样问题的下一个人有所帮助。