如何在 Google Cloud Platform 上对 Kubernetes Engine 添加资源和限制

Question

我正在尝试为我在 Kuberenetes Engine 上的部署添加资源和限制，因为我在 pod 上的部署之一不断被驱逐并显示错误消息 The node was low on resource: memory. Container model-run was using 1904944Ki, which exceeds its request of 0. 我认为可以通过添加来解决该问题资源请求。

当我尝试添加资源请求和部署时，部署成功但是当我返回并查看有关 Pod 的详细信息时，使用命令 kubectl get pod default-pod-name --output=yaml --namespace=default 它仍然说 pod 有 cpu: 100m 的请求，但没有提到我分配的内存。我猜测 100m 的 cpu 请求是默认请求。请告诉我如何分配请求和限制，我用来部署的代码如下：

kubectl run model-run --image-pull-policy=Always --overrides='
{
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "model-run",
        "labels": {
            "app": "model-run"
        }
    },
    "spec": {
        "selector": {
            "matchLabels": {
                "app": "model-run"
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "model-run"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "model-run",
                        "image": "gcr.io/some-project/news/model-run:development",
                    "imagePullPolicy": "Always",
                      "resouces": {
                        "requests": [
                          {
                            "memory": "2048Mi",
                            "cpu": "500m"
                          }
                        ],
                        "limits": [
                          {
                            "memory": "2500Mi",
                            "cpu": "750m"
                          }
                        ]
                      },
                        "volumeMounts": [
                            {
                                "name": "credentials",
                                "readOnly": true,
                                "mountPath":"/path/collection/keys"
                            }
                        ],
                        "env":[
                            {
                                "name":"GOOGLE_APPLICATION_CREDENTIALS",
                                "value":"/path/collection/keys/key.json"
                            }
                                ]
                    }
                ],
                "volumes": [
                    {
                        "name": "credentials",
                        "secret": {
                            "secretName": "credentials"
                        }
                    }
                ]
            }
        }
    }
}
'  --image=gcr.io/some-project/news/model-run:development

任何解决方案将不胜感激

Answer 1

看来我们无法通过 --overrides 标志覆盖限制。您可以做的是使用 kubectl 命令通过限制。

kubectl run model-run --image-pull-policy=Always --requests='cpu=500m,memory=2048Mi' --limits='cpu=750m,memory=2500Mi' --overrides='
{
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "model-run",
        "labels": {
            "app": "model-run"
        }
    },
    "spec": {
        "selector": {
            "matchLabels": {
                "app": "model-run"
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "model-run"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "model-run",
                        "image": "gcr.io/some-project/news/model-run:development",
                    "imagePullPolicy": "Always",
                      "resouces": {
                        "requests": [
                          {
                            "memory": "2048Mi",
                            "cpu": "500m"
                          }
                        ],
                        "limits": [
                          {
                            "memory": "2500Mi",
                            "cpu": "750m"
                          }
                        ]
                      },
                        "volumeMounts": [
                            {
                                "name": "credentials",
                                "readOnly": true,
                                "mountPath":"/path/collection/keys"
                            }
                        ],
                        "env":[
                            {
                                "name":"GOOGLE_APPLICATION_CREDENTIALS",
                                "value":"/path/collection/keys/key.json"
                            }
                                ]
                    }
                ],
                "volumes": [
                    {
                        "name": "credentials",
                        "secret": {
                            "secretName": "credentials"
                        }
                    }
                ]
            }
        }
    }
}
' --image=gcr.io/some-project/news/model-run:development

Answer 2

The node was low on resource: memory. Container model-run was using 1904944Ki, which exceeds its request of 0.

一开始，消息似乎是 node 本身缺少资源，但第二部分让我相信您尝试提高容器的请求限制是正确的。

请记住，如果在此更改后您仍然遇到错误，您可能需要将模式 powerful node-pools 添加到您的集群。

我已经完成了你的命令，有几个问题我想强调一下：

kubectl run 在 1.12 中是 deprecated 除了 pods 之外的所有资源，它在版本 1.18 中被淘汰。
apiVersion": "apps/v1beta1 是 deprecated，从 v 1.16 开始不再支持它，我替换为 apps/v1。
在spec.template.spec.container中写成"resouces"而不是"resources"
修复资源后的下一个问题是requests和limits是写成array格式的，但是他们需要是list，否则你收到此错误：

kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
error: v1beta1.Deployment.Spec: v1beta1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: ReadMapCB: expect { or n, but found [, error found in #10 byte of ...|"limits":[{"cpu":"75|..., bigger context ...|Always","name":"model-run","resources":{"limits":[{"cpu":"750m","memory":"2500Mi"}],"requests":[{"cp|...

这里是你的命令的固定格式：

kubectl run model-run --image-pull-policy=Always --overrides='{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "model-run",
    "labels": {
      "app": "model-run"
    }
  },
  "spec": {
    "selector": {
      "matchLabels": {
        "app": "model-run"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "model-run"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "model-run",
            "image": "nginx",
            "imagePullPolicy": "Always",
            "resources": {
              "requests": {
                "memory": "2048Mi",
                "cpu": "500m"
              },
              "limits": {
                "memory": "2500Mi",
                "cpu": "750m"
              }
            },
            "volumeMounts": [
              {
                "name": "credentials",
                "readOnly": true,
                "mountPath": "/path/collection/keys"
              }
            ],
            "env": [
              {
                "name": "GOOGLE_APPLICATION_CREDENTIALS",
                "value": "/path/collection/keys/key.json"
              }
            ]
          }
        ],
        "volumes": [
          {
            "name": "credentials",
            "secret": {
              "secretName": "credentials"
            }
          }
        ]
      }
    }
  }
}'  --image=gcr.io/some-project/news/model-run:development

现在在我的 Kubernetes Engine 集群 v1.15.11-gke.13 上应用它后，这里是 kubectl get pod X -o yaml 的输出：

$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
model-run-7bd8d79c7d-brmrw   1/1     Running   0          17s

$ kubectl get pod model-run-7bd8d79c7d-brmrw -o yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: model-run
    pod-template-hash: 7bd8d79c7d
    run: model-run
  name: model-run-7bd8d79c7d-brmrw
  namespace: default
spec:
  containers:
  - env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /path/collection/keys/key.json
    image: nginx
    imagePullPolicy: Always
    name: model-run
    resources:
      limits:
        cpu: 750m
        memory: 2500Mi
      requests:
        cpu: 500m
        memory: 2Gi
    volumeMounts:
    - mountPath: /path/collection/keys
      name: credentials
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-tjn5t
      readOnly: true
  nodeName: gke-cluster-115-default-pool-abca4833-4jtx
  restartPolicy: Always
  volumes:
  - name: credentials
    secret:
      defaultMode: 420
      secretName: credentials

您可以看到已设置资源限制和请求。

如果您还有任何问题，请在评论中告诉我！

如何在 Google Cloud Platform 上对 Kubernetes Engine 添加资源和限制

How to add resource and limits on Kubernetes Engine on Google Cloud Platform

google-cloud-platform

kubernetes

google-kubernetes-engine

kubernetes-pod