可以将 AutoScaling 与 Elastic Mapreduce 结合使用吗？

It is possible use AutoScaling with Elastic Mapreduce?

我想知道我是否可以使用 AutoScaling 根据 cpu 使用弹性映射减少的利用率自动扩展或缩减 Amazon Ec2 容量。

例如，我启动了一个只有 1 个实例的 mapreduce 作业，但是如果这个实例达到 50% 的利用率，我想使用创建的 AutoScaling 组来启动一个新实例。这可能吗？

你知道这是否可能吗？或者elastic mapreduce因为是"elastic"，如果需要自动启动更多的实例而不需要任何配置？

不，Auto Scaling 不能与 Amazon Elastic MapReduce (EMR) 一起使用。

可以通过 API 或命令行调用扩展 EMR，添加和删除任务节点（不托管 HDFS 存储）。请注意，无法删除核心节点（因为它们托管 HDFS 存储，并且删除节点可能会导致数据丢失）。事实上，这是Core节点和Task节点的唯一区别。

也可以从 EMR "Step" 中更改节点数。步骤按顺序执行，因此集群可以在需要大量处理的步骤之前变得更大，并且可以在后续步骤中减小大小。

You can have a different number of slave nodes for each cluster step. You can also add a step to a running cluster to modify the number of slave nodes. Because all steps are guaranteed to run sequentially by default, you can specify the number of running slave nodes for any step.

CPU 不是一个很好的衡量 EMR 集群扩展的指标，因为 Hadoop 会在作业运行时让所有节点尽可能忙碌。更好的指标是等待的作业数量，这样它们可以更快地完成。

另请参阅：

计算器溢出：Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?
计算器溢出：Can Amazon Auto Scaling Service work with Elastic Map Reduce Service?

您需要 Qubole：http://www.qubole.com/blog/product/industrys-first-auto-scaling-hadoop-clusters/

我们从未见过 users/customers 中的任何人在 Hadoop 中成功使用 vanilla 自动缩放。 Hadoop 是有状态的。节点保存 HDFS 数据和中间输出。根据 cpu/memory 删除节点是行不通的。添加节点需要技巧——这不是网站。需要查看提交的作业的大小及其完成速度。

我们运行最大的 Hadoop 集群，轻松地在 AWS 上（为我们的客户）。他们一直在自动缩放。他们使用 spot 实例。而且费用和EMR一样。

可以将 AutoScaling 与 Elastic Mapreduce 结合使用吗？

It is possible use AutoScaling with Elastic Mapreduce?

amazon-ec2

amazon-web-services