如何将基于 TypeSafe Activator 的应用程序部署到 Apache Spark 集群？

Question

我的应用程序使用 Apache Spark 进行后台数据处理，并使用 Play Framework 作为前端界面。

在 Scala 应用程序中使用 Play Framework 的最佳方法是将其与 TypeSafe 激活器一起使用。

现在，问题是我想将此应用程序部署到 spark 集群。关于如何使用 spark-submit 将 SBT 应用程序部署到集群，有很好的文档，但是如何处理基于激活器的应用程序？

请注意，我了解如何使用 this link 将 Spark 与激活器一起使用，我的问题具体是关于 在集群 上部署应用程序，例如 EC2 等

顺便说一句，该应用程序是用 Scala 编写的。

我愿意接受一些建议，例如将两个应用程序解耦并允许它们进行交互。但我不知道该怎么做，所以如果你建议提供参考，我们将不胜感激。

更新：

我尝试将依赖项添加到激活器项目中的 build.sbt 文件，但出现以下错误：

[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[error] impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1
[trace] Stack trace suppressed: run last *:update for the full output.
[error] (*:update) java.lang.IllegalStateException: impossible to get artifacts when data has not been loaded. IvyNode = org.slf4j#slf4j-api;1.6.1

以下是我在 build.sbt 文件中添加依赖项的方式：

// All the apache spark dependencies
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)

和解析器：

// All the Apache Spark resolvers
resolvers ++= Seq(
  "Apache repo" at     "https://repository.apache.org/content/repositories/releases",
  "Local Repo" at Path.userHome.asFile.toURI.toURL + "/.m2/repository", // Added local repository
  Resolver.mavenLocal )

任何解决方法？

Answer 1

activator 只是 sbt，有三处变化：

"new" 从模板创建项目的命令
一个"ui"打开教程的命令UI
尝试猜测如果你自己输入"activator"是否打开ui。要强制命令行，请使用 "activator shell"

所以你读到的关于 sbt 的所有内容都适用。如果您愿意，您也可以在您的项目中使用 sbt，但除非您使用 "new" 或 "ui"

，否则这是一回事

您问题的简短答案可能是使用 sbt-native-packager 插件及其 "stage" 任务；播放文档有一个部署部分对此进行了描述。

Answer 2

事实证明，Play 框架和 Apache Spark 的一个问题是依赖冲突，可以通过将依赖从 Spark 依赖列表中排除来轻松解决。

// All the apache spark dependencies
libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % sparkVersion % "provided" withSources() excludeAll(
    ExclusionRule(organization = "org.slf4j")
    ),
  "org.apache.spark" % "spark-sql_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-streaming_2.10" % sparkVersion % "provided" withSources(),
  "org.apache.spark" % "spark-mllib_2.10" % sparkVersion % "provided" withSources()
)

此外，要在控制台中使用，可以轻松地将以下内容添加到 build.sbt 文件中，以便直接导入基本的 spark 包。

/// console

// define the statements initially evaluated when entering 'console', 'consoleQuick', or 'consoleProject'
// but still keep the console settings in the sbt-spark-package plugin

// If you want to use yarn-client for spark cluster mode, override the environment variable
// SPARK_MODE=yarn-client <cmd>
val sparkMode = sys.env.getOrElse("SPARK_MODE", "local[2]")


initialCommands in console :=
  s"""
     |import org.apache.spark.SparkConf
     |import org.apache.spark.SparkContext
     |import org.apache.spark.SparkContext._
     |
     |@transient val sc = new SparkContext(
     |  new SparkConf()
     |    .setMaster("$sparkMode")
                                  |    .setAppName("Console test"))
                                  |implicit def sparkContext = sc
                                  |import sc._
                                  |
                                  |@transient val sqlc = new org.apache.spark.sql.SQLContext(sc)
                                  |implicit def sqlContext = sqlc
                                  |import sqlc._
                                  |
                                  |def time[T](f: => T): T = {
                                  |  import System.{currentTimeMillis => now}
                                  |  val start = now
                                  |  try { f } finally { println("Elapsed: " + (now - start)/1000.0 + " s") }
                                  |}
                                  |
                                  |""".stripMargin

cleanupCommands in console :=
  s"""
     |sc.stop()
   """.stripMargin

现在，主要问题是应用程序的部署。通过运行 play framework，在集群上启动多个节点的应用程序很麻烦，因为HTTP请求处理程序必须有一个特定的URL。这个问题可以通过在主节点上启动 Play Framework 实例并将 URL 指向它的 IP 来解决。

如何将基于 TypeSafe Activator 的应用程序部署到 Apache Spark 集群？

How to deploy TypeSafe Activator based application to an Apache Spark cluster?

scala

typesafe-activator

apache-spark