AWS ECS 任务内存硬限制和软限制

AWS ECS Task Memory Hard and Soft Limits

我对为 ECS 任务定义同时设置硬内存和软内存限制的目的感到困惑。

IIRC 软限制是调度程序在实例上为任务保留多少内存到 运行,而硬限制是容器在被谋杀之前可以使用多少内存。

我的问题是,如果 ECS 调度程序根据软限制将任务分配给实例,您可能会遇到这样一种情况:任务使用的内存高于软限制但低于硬限制可能会导致实例超出它的最大内存(假设所有其他任务使用的内存略低于或等于它们的软限制)。

这是正确的吗?

谢谢

如果您希望 运行 计算工作负载主要受内存限制而不是 CPU 限制,那么您应该只使用硬限制,而不是软限制。来自文档:

You must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation. If you specify memoryReservation, then that value is subtracted from the available memory resources for the container instance on which the container is placed; otherwise, the value of memory is used.

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html

通过只为您的任务指定一个硬内存限制,您可以避免 运行 内存不足,因为 ECS 停止在实例上放置任务,并且 docker 杀死任何试图越过内存的容器硬限制。

软内存限制功能专为 CPU 受限应用程序而设计,在这些应用程序中,您希望保留最少的内存(软限制)但允许偶尔爆发到硬限制。在这种 CPU 繁重的工作负载中,您并不真正关心容器的内存使用量的具体值,因为容器会 运行 在耗尽 CPU 之前很久实例的内存,因此您可以根据 CPU 预留和软内存限制放置任务。在此设置中,硬限制只是一种故障保护,以防出现失控或内存泄漏。

所以总而言之,您应该使用负载测试来评估您的工作负载,看看它是否倾向于先 运行 超出 CPU 或首先超出内存。如果您受 CPU 约束,那么您可以使用软内存限制和可选的硬限制作为故障保护。如果您受内存限制,那么您将只需要使用硬限制而不使用软限制。

@nathanpeck 是这里的权威,但我只想解决您提出的一个特定场景:

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

来自 AWS 的

This post 解释了在这种情况下会发生什么:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).