Worker 无法连接到 Spark Apache 中的 master
Worker failed to connect to master in Spark Apache
我正在使用独立集群管理器部署 Spark Apache 应用程序。我的架构使用 2 windows 台机器,一台作为主机,另一台作为从机(worker)
master:我在上面 运行:\bin>spark-class org.apache.spark.deploy.master.Master
这是网络 UI 显示的内容:
slave:我在上面 运行:\bin>spark-class org.apache.spark.deploy.worker.Worker spark://192.*.*.186:7077
这就是网络 UI 显示的内容:
问题是工作节点无法连接到主节点并显示以下错误:
17/09/26 16:05:17 INFO Worker: Connecting to master 192.*.*.186:7077...
17/09/26 16:05:22 WARN Worker: Failed to connect to master 192.*.*.186:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$$anon.run(Worker.scala:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to /192.*.*.186:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.*.*.186:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
知道两台机器都禁用了防火墙,我测试了它们之间的连接(使用 nmap),一切正常!但使用 telnet 我收到此错误:Connecting To 192.*.*.186...Could not open connection to the host, on port 23: Connect failed
你能给我看看你的 spark-env.sh
配置文件吗?这将有助于查明您的问题。
我的第一个想法是您需要导出 SPARK_MASTER_HOST=(master ip)
而不是 spark-env.sh
文件中的 SPARK_MASTER_IP
。你需要为主人和奴隶做这件事。同时为主从导出 SPARK_LOCAL_IP
。
我正在使用独立集群管理器部署 Spark Apache 应用程序。我的架构使用 2 windows 台机器,一台作为主机,另一台作为从机(worker)
master:我在上面 运行:\bin>spark-class org.apache.spark.deploy.master.Master
这是网络 UI 显示的内容:
slave:我在上面 运行:\bin>spark-class org.apache.spark.deploy.worker.Worker spark://192.*.*.186:7077
这就是网络 UI 显示的内容:
问题是工作节点无法连接到主节点并显示以下错误:
17/09/26 16:05:17 INFO Worker: Connecting to master 192.*.*.186:7077...
17/09/26 16:05:22 WARN Worker: Failed to connect to master 192.*.*.186:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$$anon.run(Worker.scala:241)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to /192.*.*.186:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon.call(Outbox.scala:190)
... 4 more
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.*.*.186:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
知道两台机器都禁用了防火墙,我测试了它们之间的连接(使用 nmap),一切正常!但使用 telnet 我收到此错误:Connecting To 192.*.*.186...Could not open connection to the host, on port 23: Connect failed
你能给我看看你的 spark-env.sh
配置文件吗?这将有助于查明您的问题。
我的第一个想法是您需要导出 SPARK_MASTER_HOST=(master ip)
而不是 spark-env.sh
文件中的 SPARK_MASTER_IP
。你需要为主人和奴隶做这件事。同时为主从导出 SPARK_LOCAL_IP
。