HPA 无法读取 GKE 上的指标值（CPU 利用率）

Question

我正在单个集群上开发 Google Kubernetes Engine。集群自动缩放节点数。我已经创建了三个 Deployment 并使用网站设置了自动缩放策略（Workloads -> Deployment -> Actions -> Auto-scaling），所以没有手动编写 YAML 配置。基于官方guide，我没有搞错

If you do not specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.

以下是完整的部署 YAML：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: student
  name: student
  namespace: ulibretto
spec:
  replicas: 1
  selector:
    matchLabels:
      app: student
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: student
    spec:
      containers:
        - env:
            - name: CLUSTER_HOST
              valueFrom:
                configMapKeyRef:
                  key: CLUSTER_HOST
                  name: shared-env-vars
            - name: BIND_HOST
              valueFrom:
                configMapKeyRef:
                  key: BIND_HOST
                  name: shared-env-vars
            - name: TOKEN_TIMEOUT
              valueFrom:
                configMapKeyRef:
                  key: TOKEN_TIMEOUT
                  name: shared-env-vars
          image: gcr.io/ulibretto/github.com/ulibretto/studentservice
          imagePullPolicy: IfNotPresent
          name: studentservice-1
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: student
  name: student-hpa-n3bp
  namespace: ulibretto
spec:
  maxReplicas: 100
  metrics:
    - resource:
        name: cpu
        targetAverageUtilization: 80
      type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: student
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    cloud.google.com/neg: '{"ingress":true}'
  labels:
    app: student
  name: student-ingress
  namespace: ulibretto
spec:
  clusterIP: 10.44.5.59
  ports:
    - port: 5000
      protocol: TCP
      targetPort: 5000
  selector:
    app: student
  sessionAffinity: None
  type: ClusterIP

问题是 HPA 看不到指标（平均 CPU 利用率），这真的很奇怪（见图）。 HPA cannot read metric value

我错过了什么？

Answer 1

已编辑

你是对的。您不需要像我之前提到的那样在 scaleTargetRef: 中指定 namespace: ulibretto。

由于您提供了所有 YAML，我能够找到正确的根本原因。

如果你检查 GKE docs 你会在代码中找到注释

    resources:
      # You must specify requests for CPU to autoscale
      # based on CPU utilization
      requests:
        cpu: "250m"

您的部署没有指定 resource requests。我试过这个（我删除了一些部分，因为我无法部署您的容器并更改了 HPA 中的 apiVersion）：

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: student
  name: student
  namespace: ulibretto
spec:
  replicas: 3
  selector:
    matchLabels:
      app: student
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: student
    spec:
      containers:
      - image: nginx
        imagePullPolicy: IfNotPresent
        name: studentservice-1
        resources:
          requests:
            cpu: "250m"
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: student
  name: student-hpa
  namespace: ulibretto
spec:
  maxReplicas: 100
  minReplicas: 1
  targetCPUUtilizationPercentage: 80
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: student

$ kubectl get all -n ulibretto
NAME                           READY   STATUS    RESTARTS   AGE
pod/student-6f797d5888-84xfq   1/1     Running   0          7s
pod/student-6f797d5888-b7ctq   1/1     Running   0          7s
pod/student-6f797d5888-fbtmd   1/1     Running   0          7s
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/student   3/3     3            3           7s
NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/student-6f797d5888   3         3         3       7s
NAME                                              REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/student-hpa   Deployment/student   <unknown>/80%   1         100       0          7s

大约 1-5 分钟后，您将收到一些指标。

$ kubectl get all -n ulibretto
NAME                           READY   STATUS    RESTARTS   AGE
pod/student-6f797d5888-84xfq   1/1     Running   0          95s
pod/student-6f797d5888-b7ctq   1/1     Running   0          95s
pod/student-6f797d5888-fbtmd   1/1     Running   0          95s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/student   3/3     3            3           95s

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/student-6f797d5888   3         3         3       95s

NAME                                              REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/student-hpa   Deployment/student   0%/80%    1         100       3          95s

如果您想使用 CLI 创建 HPA，情况相同：

$ kubectl autoscale deployment student -n ulibretto --cpu-percent=50 --min=1 --max=100
horizontalpodautoscaler.autoscaling/student autoscaled

$ kubectl get hpa -n ulibretto
NAME      REFERENCE            TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
student   Deployment/student   <unknown>/50%   1         100       0          3s

一段时间后您将收到 0% 而不是 <unknown>

$ kubectl get all -n ulibretto
NAME                           READY   STATUS    RESTARTS   AGE
pod/student-6f797d5888-84xfq   1/1     Running   0          4m4s
pod/student-6f797d5888-b7ctq   1/1     Running   0          4m4s
pod/student-6f797d5888-fbtmd   1/1     Running   0          4m4s
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/student   3/3     3            3           4m5s
NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/student-6f797d5888   3         3         3       4m5s
NAME                                          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/student   Deployment/student   0%/50%    1         100       3          58s

HPA 无法读取 GKE 上的指标值（CPU 利用率）

HPA cannot read metric value (CPU utilization) on GKE

yaml

gcloud

kubernetes

google-kubernetes-engine