Spark 作业卡在方法 collect 上

Question

当我运行我的 Spark 作业似乎卡在收集上时：

我使用命令启动 jar：

./spark-1.3.0-bin-hadoop2.4/bin/spark-submit \
  --class com.MyObject \
  --master spark://192.168.192.22:7077 \
  --executor-memory 512M \
  --driver-memory 512M \
  --deploy-mode cluster \
  --total-executor-cores 4 \
  /home/pi/spark-job-jars/spark-job-0.0.1-SNAPSHOT.jar

Jar source : 

package com

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object MyObject {

  def main(args: Array[String]) {

    println("here")


    val sc = new SparkContext(new SparkConf())

    val l = (1 to 10).toList
    val s = sc.parallelize(l)
    val out = s.map(m => m * 3)
    out.collect.foreach(println)

  }

}

Jar pom

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>spark-job</groupId>
    <artifactId>spark-job</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <build>
        <sourceDirectory>src</sourceDirectory>
        <resources>
            <resource>
                <directory>src</directory>
                <excludes>
                    <exclude>**/*.java</exclude>
                </excludes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <source>1.5</source>
                    <target>1.5</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <dependencies>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.2.1</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.2.1</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

</project>

我可以看到作业正在运行ning 但从未完成：

我 creating/deploying jar 的方式有问题以致于它无法完成工作吗？

Answer 1

"Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications."

取自： https://spark.apache.org/docs/1.2.0/submitting-applications.html

Answer 2

stop() 活动的 SparkContext 即将结束。那对我有用。对于您的代码，请尝试进行此更改。

  val sc = new SparkContext(new SparkConf())

  try {

    val l = (1 to 10).toList
    val s = sc.parallelize(l)
    val out = s.map(m => m * 3)
    out.collect.foreach(println)

  } finally {
    sc.stop()
  }

Spark 作业卡在方法 collect 上

Spark job is stuck on method collect

apache-spark