spark 1.3 工作人员接受工作但控制台说资源不可用
spark 1.3 workers accepting jobs but console says resources not available
我正在尝试 运行 亚马逊 EMR 上的 apache spark 1.3 与亚马逊的 hadoop 2.4 独立运行,有 2 名工人。但是当我这样做时,我收到以下消息:
[TaskSchedulerImpl] - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
我正在设置以下参数:
conf = new SparkConf();
conf.setAppName("SVM Classifier Example");
conf.set("spark.executor.memory", "1024m");
conf.set("spark.cores.max", "1");
但是当我 运行 在本地(使用 apache hadoop 2.4 和 spark 1.3)执行相同操作时,我可以在几秒钟内执行它。
我检查了每台工作机器在这两种情况下都有大约 1.6G 的大量可用内存,所以这不是问题。
以下是工作人员的日志:
15/03/26 20:54:27 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/03/26 20:54:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/26 20:54:29 INFO spark.SecurityManager: Changing view acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: Changing modify acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/03/26 20:54:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/26 20:54:31 INFO Remoting: Starting remoting
15/03/26 20:54:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@ip-XXXX.ec2.internal:50899]
15/03/26 20:54:31 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 50899.
15/03/26 20:54:32 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161]].
我不知道哪里出了问题。欢迎任何意见和建议。
编辑:我无法上传控制台的屏幕截图。
但这里有详细信息:
> Worker Id Cores Memory
> 1 8 (8 Used) 1172.0 MB (1024.0 MB Used)
> 2 8 (8 Used) 1536.0 MB (1024.0 MB Used)
>Running Applications
> ID Cores Memory per Node User State Duration
> 1 16 1024.0 MB root Running 1.5h
原来问题出在我系统的防火墙上。防火墙策略是这样的,worker 可以与 master 通信,但不能 driver。打开了双向通信的prots,这解决了我的问题。
我正在尝试 运行 亚马逊 EMR 上的 apache spark 1.3 与亚马逊的 hadoop 2.4 独立运行,有 2 名工人。但是当我这样做时,我收到以下消息:
[TaskSchedulerImpl] - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
我正在设置以下参数:
conf = new SparkConf();
conf.setAppName("SVM Classifier Example");
conf.set("spark.executor.memory", "1024m");
conf.set("spark.cores.max", "1");
但是当我 运行 在本地(使用 apache hadoop 2.4 和 spark 1.3)执行相同操作时,我可以在几秒钟内执行它。 我检查了每台工作机器在这两种情况下都有大约 1.6G 的大量可用内存,所以这不是问题。
以下是工作人员的日志:
15/03/26 20:54:27 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
15/03/26 20:54:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/26 20:54:29 INFO spark.SecurityManager: Changing view acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: Changing modify acls to: root
15/03/26 20:54:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/03/26 20:54:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/03/26 20:54:31 INFO Remoting: Starting remoting
15/03/26 20:54:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@ip-XXXX.ec2.internal:50899]
15/03/26 20:54:31 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 50899.
15/03/26 20:54:32 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161] has failed, address is now gated for [5000] ms. Reason is: [Association failed with [akka.tcp://sparkDriver@ip-XXXX.ec2.internal:49161]].
我不知道哪里出了问题。欢迎任何意见和建议。
编辑:我无法上传控制台的屏幕截图。 但这里有详细信息:
> Worker Id Cores Memory
> 1 8 (8 Used) 1172.0 MB (1024.0 MB Used)
> 2 8 (8 Used) 1536.0 MB (1024.0 MB Used)
>Running Applications
> ID Cores Memory per Node User State Duration
> 1 16 1024.0 MB root Running 1.5h
原来问题出在我系统的防火墙上。防火墙策略是这样的,worker 可以与 master 通信,但不能 driver。打开了双向通信的prots,这解决了我的问题。