如何避免最后一个 pod 在 AKS 中自动节点缩减时被杀死

How to avoid the last pod being killed on automatic node scale down in AKS

我们使用 Azure AKS v1.17.9 为 pods（使用 HorizontalPodAutoscaler）和节点自动缩放。总的来说它运行良好，但我们在某些情况下看到过中断。我们有一些部署，其中 minReplicas=1 和 maxReplicas=4。大多数情况下，这样的部署只会有一个 pod 运行。在某些情况下，自动缩放器决定缩减节点，最后剩下的 pod 已被杀死。稍后在另一个节点上启动了一个新的 pod，但这意味着中断。

我本来希望自动缩放器首先在另一个节点上创建一个新的 pod 运行（将副本数增加到允许的值 2），然后缩减旧的 pod。如果没有停机时间，那将是可行的。因为是先杀后问

除了设置 minReplicas=2 的明显替代方案（这会增加成本，因为所有这些 pods 都加倍，需要额外的虚拟机），是否有其他解决方法？这是预期的，还是一个错误？

In some cases where the auto-scaler has decided to scale down a node, the last remaining pod has been killed. Later a new pod is started on another node, but this means an outage.

因此，在生产环境中，您应该始终至少为 Deployment 准备 2 个副本。你应该使用 Pod Anti-Affinity so that those two pods are not scheduled to the same Availability Zone。例如。如果一个可用区出现网络问题，您的应用仍然可用。

通常至少有 3 个副本，每个可用区一个，因为云提供商通常每个 Region 有 3 个可用区 - 这样您可以使用 inter-zone 流量，通常比 cross-zone 流量便宜。

您总是可以使用更少的副本来节省成本，但它是 trade-off 并且可用性更差。

如何避免最后一个 pod 在 AKS 中自动节点缩减时被杀死

How to avoid the last pod being killed on automatic node scale down in AKS

kubernetes

azure-aks