Horizontal pod Autoscaler 在 GKE 上过度扩展自定义指标

Question

我在 Google Kubernetes Engine 上进行了以下 Horizontal Pod Autoscaller 配置，以通过自定义指标扩展部署 - RabbitMQ messages ready count 针对特定队列：foo-queue.

它正确地获取了度量值。

插入 2 条消息时，它会将部署扩展到最多 10 个副本。我希望它扩展到 2 个副本，因为 targetValue 是 1 并且有 2 条消息准备好了。

为什么它的规模如此之大？

HPA 配置：

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: foo-hpa
  namespace: development
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: foo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
      metricSelector:
        matchLabels:
          metric.labels.queue: foo-queue
      targetValue: 1

Answer 1

尝试按照描述 k8s

中 RabbitMQ 的水平自动缩放设置的说明进行操作

Kubernetes Workers Autoscaling based on RabbitMQ queue size

特别是，targetValue: 20 的指标 rabbitmq_queue_messages_ready 被推荐而不是 targetValue: 1:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: workers-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: my-workers
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
      metricSelector:
        matchLabels:
          metric.labels.queue: myqueue
      **targetValue: 20

Now our deployment my-workers will grow if RabbitMQ queue myqueue has more than 20 non-processed jobs in total

Answer 2

根据https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

从最基本的角度来看，Horizontal Pod Autoscaler 控制器根据所需指标值与当前指标值之间的比率运行：

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

从上面我了解到，只要队列有消息，k8 HPA 就会继续扩展，因为 currentReplicas 是 desiredReplicas 计算的一部分。

例如如果：

currentReplicas = 1

currentMetricValue / desiredMetricValue = 2/1

然后：

desiredReplicas = 2

如果指标在下一个 hpa 周期中保持不变，currentReplicas 将变为 2，desiredReplicas 将提高到 4

Answer 3

我认为您在 HorizontalPodAutoscalers 方面做得很好。但是，根据您的问题，我认为您正在寻找 targetAverageValue 而不是 targetValue。

在 the Kubernetes docs on HPAs 中提到使用 targetAverageValue 指示 Kubernetes 根据自动缩放器下所有 Pods 暴露的平均指标来缩放 pods。虽然文档没有明确说明，但外部指标（如消息队列中等待的作业数）算作一个数据点。通过使用 targetAverageValue 缩放外部指标，您可以创建一个自动缩放器，缩放 Pods 的数量以匹配 Pods 与作业的比率。

回到你的例子：

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: foo-hpa
  namespace: development
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: foo
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: "custom.googleapis.com|rabbitmq_queue_messages_ready"
      metricSelector:
        matchLabels:
          metric.labels.queue: foo-queue
      # Aim for one Pod per message in the queue
      targetAverageValue: 1

将导致 HPA 尝试为队列中的每条消息保留一个 Pod（最多 10 个 pods）。

顺便说一句，针对每条消息定位一个 Pod 可能会导致您不断地启动和停止 Pods。如果您最终启动大量 Pods 并处理队列中的所有消息，Kubernetes 会将您的 Pods 缩小到 1。取决于启动您的 Pods 所需的时间以及处理您的消息需要多长时间，您可以通过指定更高的 targetAverageValue 来降低平均消息延迟。理想情况下，给定恒定的流量，您应该以恒定数量的 Pods 处理消息为目标（这要求您以与排队消息大致相同的速度处理消息）。

Answer 4

我正在使用来自 RabbitMQ 的相同 Prometheus 指标（我正在使用 Celery 和 RabbitMQ 作为代理）。

这里有人考虑过使用 rabbitmq_queue_messages_unacked 指标而不是 rabbitmq_queue_messages_ready 吗？

事情是，rabbitmq_queue_messages_ready 正在减少，一旦工人拉取消息，我担心长时间的运行ning 任务可能会被 HPA 杀死，而 rabbitmq_queue_messages_unacked 一直保持到任务完成。

例如，我有一条消息会触发一个新的 pod (celery-worker) 来运行一个需要 30 分钟的任务。 rabbitmq_queue_messages_ready 将减少，因为 pod 正在运行ning 并且 HPA cooldown/delay 将终止 pod。

EDIT：似乎第三个 rabbitmq_queue_messages 是正确的 - 这是 unacked 和 ready 的总和：

sum of ready and unacknowledged messages - total queue depth

documentation

Horizontal pod Autoscaler 在 GKE 上过度扩展自定义指标

Horizontal pod Autoscaler scales custom metric too aggressively on GKE

rabbitmq

kubernetes

google-kubernetes-engine

kubernetes-hpa

Horizo​​ntal pod Autoscaler 在 GKE 上过度扩展自定义指标

Horizontal pod Autoscaler scales custom metric too aggressively on GKE

rabbitmq

kubernetes

google-kubernetes-engine

kubernetes-hpa

Horizontal pod Autoscaler 在 GKE 上过度扩展自定义指标