基于 Apache Spark 的多用户 Web 应用程序的最佳实践?

Best Practices for Multi-User Web Application on Top of Apache Spark?

我正在 Apache Spark 中处理各种 ML 繁重的应用程序,并着眼于生产。直接在 Apache Spark 上构建交互式多用户 Web 应用程序的最佳方法是什么?

我考虑过为我公司的分析师做类似的事情。在决定推出自己的产品之前,最好看看 Livy 之类的东西。

Livy is an open source REST interface for interacting with Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in YARN.

功能包括:

  • Interactive Scala, Python and R shells
  • Batch submissions in Scala, Java, Python
  • Multi users can share the same server (impersonation support)
  • Can be used for submitting jobs from anywhere with REST
  • Does not require any code change to your programs

我不知道我在最后一个项目符号中投入了多少库存,但这就是他们文档中的内容。

您可以构建一个 Web 界面,以便更轻松地 post 将代码片段发送到作业服务器,或者编写一个 python 脚本,使用请求将作业作为文件发送 a-la spark-submit。

如果您决定进一步发展这个想法,其他需要研究的事情是如何在 运行 spark 应用程序中共享资源。 Fair Scheduler 在那种情况下很有意义。

Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.