Spark 作业卡在方法 collect 上
Spark job is stuck on method collect
当我 运行 我的 Spark 作业似乎卡在收集上时:
我使用命令启动 jar:
./spark-1.3.0-bin-hadoop2.4/bin/spark-submit \
--class com.MyObject \
--master spark://192.168.192.22:7077 \
--executor-memory 512M \
--driver-memory 512M \
--deploy-mode cluster \
--total-executor-cores 4 \
/home/pi/spark-job-jars/spark-job-0.0.1-SNAPSHOT.jar
Jar source :
package com
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object MyObject {
def main(args: Array[String]) {
println("here")
val sc = new SparkContext(new SparkConf())
val l = (1 to 10).toList
val s = sc.parallelize(l)
val out = s.map(m => m * 3)
out.collect.foreach(println)
}
}
Jar pom
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>spark-job</groupId>
<artifactId>spark-job</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<build>
<sourceDirectory>src</sourceDirectory>
<resources>
<resource>
<directory>src</directory>
<excludes>
<exclude>**/*.java</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.5</source>
<target>1.5</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
我可以看到作业正在 运行ning 但从未完成:
我 creating/deploying jar 的方式有问题以致于它无法完成工作吗?
"Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications."
取自:
https://spark.apache.org/docs/1.2.0/submitting-applications.html
stop() 活动的 SparkContext 即将结束。那对我有用。对于您的代码,请尝试进行此更改。
val sc = new SparkContext(new SparkConf())
try {
val l = (1 to 10).toList
val s = sc.parallelize(l)
val out = s.map(m => m * 3)
out.collect.foreach(println)
} finally {
sc.stop()
}
当我 运行 我的 Spark 作业似乎卡在收集上时:
我使用命令启动 jar:
./spark-1.3.0-bin-hadoop2.4/bin/spark-submit \
--class com.MyObject \
--master spark://192.168.192.22:7077 \
--executor-memory 512M \
--driver-memory 512M \
--deploy-mode cluster \
--total-executor-cores 4 \
/home/pi/spark-job-jars/spark-job-0.0.1-SNAPSHOT.jar
Jar source :
package com
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object MyObject {
def main(args: Array[String]) {
println("here")
val sc = new SparkContext(new SparkConf())
val l = (1 to 10).toList
val s = sc.parallelize(l)
val out = s.map(m => m * 3)
out.collect.foreach(println)
}
}
Jar pom
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>spark-job</groupId>
<artifactId>spark-job</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<build>
<sourceDirectory>src</sourceDirectory>
<resources>
<resource>
<directory>src</directory>
<excludes>
<exclude>**/*.java</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.5</source>
<target>1.5</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
我可以看到作业正在 运行ning 但从未完成:
我 creating/deploying jar 的方式有问题以致于它无法完成工作吗?
"Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications."
取自: https://spark.apache.org/docs/1.2.0/submitting-applications.html
stop() 活动的 SparkContext 即将结束。那对我有用。对于您的代码,请尝试进行此更改。
val sc = new SparkContext(new SparkConf())
try {
val l = (1 to 10).toList
val s = sc.parallelize(l)
val out = s.map(m => m * 3)
out.collect.foreach(println)
} finally {
sc.stop()
}