如何更改 Spark 设置以允许 spark.dynamicAllocation.enabled？

Question

我在 pyspark 中运行一个 python 脚本并得到以下错误： NameError：名称 'spark' 未定义

我查了一下，发现原因是spark.dynamicAllocation.enabled还没有被允许。

根据 Spark 的文档 (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-dynamic-allocation.html#spark_dynamicAllocation_enabled)：spark.dynamicAllocation.enabled（默认值：false）控制是否启用动态分配。假定 spark.executor.instances 未设置或为 0（默认值）。

由于默认设置是 false，我需要更改 Spark 设置以启用 spark.dynamicAllocation.enabled。

我用 brew 安装了 Spark，并没有改变它的 configuration/setting。

如何更改设置并启用 spark.dynamicAllocation.enabled？

非常感谢。

Answer 1

有几个地方可以设置。如果您想在每个作业的基础上启用它，请在每个应用程序中设置以下内容：

conf.set("spark.dynamicAllocation.enabled","true")

如果要为所有作业设置 if，请导航至 spark.conf 文件。在 Hortonworks 发行版中，它应该是

/usr/hdp/current/spark-client/conf/

将设置添加到您的 spark-defaults.conf 应该就可以了。

Answer 2

Question : How can I change the setting and enable spark.dynamicAllocation.enabled?

您可以通过 3 个选项实现此目的。
1)修改spark-defaults.conf
中下面提到的参数 2) 从您的 spark-submit
--conf 发送以下参数 3) 以编程方式指定动态分配的配置，如下所示。

您可以通过这种方式以编程方式执行其中的操作您可以像这样以编程方式进行。

val conf = new SparkConf()
      .setMaster("ClusterManager")
      .setAppName("test-executor-allocation-manager")
      .set("spark.dynamicAllocation.enabled", "true")
      .set("spark.dynamicAllocation.minExecutors", 1)
      .set("spark.dynamicAllocation.maxExecutors", 2)
      .set("spark.shuffle.service.enabled", "true") // for stand alone

Answer 3

这个问题也会影响使用其他资源进行的 Spark 安装，例如用于在 Amazon Web Services 上安装的 spark-ec2 脚本。从 Spark 文档中，需要设置 SPARK_HOME/conf/spark-defaults.conf 中的两个值：

spark.shuffle.service.enabled   true
spark.dynamicAllocation.enabled true

看到这个：https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation

如果您的安装在 SPARK_HOME/conf 中有一个 spark-env.sh 脚本，请确保它没有像下面这样的行，或者它们被注释掉了：

export SPARK_WORKER_INSTANCES=1 #or some other integer, or
export SPARK_EXECUTOR_INSTANCES=1 #or some me other integer

Answer 4

可以使用以下类似命令通过笔记本在 pyspark 中设置配置参数：

spark.conf.set("spark.sql.crossJoin.enabled", "true")

Answer 5

除了之前的答案之外，由于解释器设置（如果您使用 Zeppelin），提到的所有配置可能都不起作用。我使用 Livy，它的默认设置覆盖了 dynamicAllocation 参数。

如何更改 Spark 设置以允许 spark.dynamicAllocation.enabled？

How to change Spark setting to allow spark.dynamicAllocation.enabled?

python

configuration

dynamic-allocation

apache-spark

pyspark