Hadoop Capacity Scheduler - 使用默认队列

Hadoop Capacity Scheduler - using default queue

当使用只有一个队列(默认)的 Hadoop 容量调度程序时,hadoop 如何调度该作业中的不同作业?是先进先出吗?它有不同的机制吗?

使用公平调度程序时,这种行为有什么不同吗?

来自cloudera旧文章(Hadoop1.x):

Once a queue is selected, the Scheduler picks a job in the queue. Jobs are sorted based on when they're submitted and their priorities (if the queue supports priorities).

作业按顺序考虑,a job is selected if its user is within the user-quota for the queue,即用户尚未使用超过 his/her 限制的队列资源。调度程序还确保 TaskTracker 中有足够的空闲内存来调整作业的任务,以防作业有特殊的内存要求。

选择作业后,调度程序会选择一个任务 运行。这种选择任务的逻辑与早期版本保持不变。

来自 CapacityScheduler 上的 Apache 官方文档:

Resource-based Scheduling - 支持资源密集型应用程序,其中应用程序可以选择指定比默认值更高的资源要求,从而适应具有不同资源要求的应用程序。 Currently, memory is the the resource requirement supported.

来自 Apache 官方文档FairScheduler

Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time. Hadoop NextGen is capable of scheduling multiple resource types. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU, using the notion of Dominant Resource Fairness developed by Ghodsi et al. When there is a single app running, that app uses the entire cluster

在每个队列中,调度策略用于在 运行ning 应用程序之间共享资源。 The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured。队列可以层次化划分资源,配置权重按特定比例共享集群。