如何防止水平 Pod 自动缩放器取出正在积极工作的 pods？

How can I prevent horizontal pod autoscaler from taking out pods which are actively doing work?

我想使用 Horizontal Pod Autoscaler 批量运行一组长期运行ning 任务。在某些情况下，这些任务运行可能需要几分钟或几小时，并且始终使用 80~100% 的可用 CPU 资源。

我想了解 Autoscaler 在决定缩减队列时的行为。

假设有 4 个实例都在工作，并且它们都处于 95% CPU 的利用率。它无法再扩展，因为最大实例 # 设置为 4。扩展阈值设置为 75% avg CPU 利用率。
如果 2 个实例提前完成工作，但另外 2 个实例还有剩余工作时间，则队列的平均 CPU 利用率会下降到 50%。
然后 Autoscaler 决定是时候缩减了。但是，4 个实例中有 2 个仍在工作，因此 Autoscaler 有 50% 的可能性 select 正在积极工作的 pod 并终止它。
如果发生这种情况，该工作进度将丢失并标记为未完成，可用的 pods 之一将获取该工作并从头开始工作。

有没有一种方法可以通过优先 pods 使用率最低的 CPU select 来防止这种情况发生？这样，那些正在处理作品的 pods 将保持原样。

我不知道在缩减副本数量时自定义应删除部署中的哪些副本的方法。

也许您可以通过设置 terminationGracePeriodSeconds 并使用 preStop 钩子来解决您的问题。

使用 terminationGracePeriodSeconds，您可以指定 pod 中的容器在发送第一个 SIGTERM 信号和发送 SIGKILL 信号之间等待的时间。这对您来说不是最理想的，因为 AFAIU 您不知道 pod 完成分配的任务需要多长时间。但是，如果您将此值设置得足够高，您也可以利用 preStop 挂钩。来自 documentation:

PreStop is called immediately before a container is terminated due to an API request or management event such as liveness/startup probe failure, preemption, resource contention, etc. The handler is not called if the container crashes or exits. The reason for termination is passed to the handler. The Pod's termination grace period countdown begins before the PreStop hooked is executed. Regardless of the outcome of the handler, the container will eventually terminate within the Pod's termination grace period. Other management of the container blocks until the hook completes or until the termination grace period is reached.

如果您能够从容器内运行一个“阻塞”的命令，直到容器完成工作，那么您应该能够让它仅在空闲时终止。

让我也 link 一个不错的博客 post 解释整个事情是如何工作的：https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html

如何防止水平 Pod 自动缩放器取出正在积极工作的 pods？

How can I prevent horizontal pod autoscaler from taking out pods which are actively doing work?

kubernetes

google-kubernetes-engine