如何在 Google Cloud Platform 上对 Kubernetes Engine 添加资源和限制

How to add resource and limits on Kubernetes Engine on Google Cloud Platform

我正在尝试为我在 Kuberenetes Engine 上的部署添加资源和限制,因为我在 pod 上的部署之一不断被驱逐并显示错误消息 The node was low on resource: memory. Container model-run was using 1904944Ki, which exceeds its request of 0. 我认为可以通过添加来解决该问题资源请求。

当我尝试添加资源请求和部署时,部署成功但是当我返回并查看有关 Pod 的详细信息时,使用命令 kubectl get pod default-pod-name --output=yaml --namespace=default 它仍然说 pod 有 cpu: 100m 的请求,但没有提到我分配的内存。我猜测 100m 的 cpu 请求是默认请求。请告诉我如何分配请求和限制,我用来部署的代码如下:

kubectl run model-run --image-pull-policy=Always --overrides='
{
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "model-run",
        "labels": {
            "app": "model-run"
        }
    },
    "spec": {
        "selector": {
            "matchLabels": {
                "app": "model-run"
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "model-run"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "model-run",
                        "image": "gcr.io/some-project/news/model-run:development",
                    "imagePullPolicy": "Always",
                      "resouces": {
                        "requests": [
                          {
                            "memory": "2048Mi",
                            "cpu": "500m"
                          }
                        ],
                        "limits": [
                          {
                            "memory": "2500Mi",
                            "cpu": "750m"
                          }
                        ]
                      },
                        "volumeMounts": [
                            {
                                "name": "credentials",
                                "readOnly": true,
                                "mountPath":"/path/collection/keys"
                            }
                        ],
                        "env":[
                            {
                                "name":"GOOGLE_APPLICATION_CREDENTIALS",
                                "value":"/path/collection/keys/key.json"
                            }
                                ]
                    }
                ],
                "volumes": [
                    {
                        "name": "credentials",
                        "secret": {
                            "secretName": "credentials"
                        }
                    }
                ]
            }
        }
    }
}
'  --image=gcr.io/some-project/news/model-run:development

任何解决方案将不胜感激

看来我们无法通过 --overrides 标志覆盖限制。 您可以做的是使用 kubectl 命令通过限制。

kubectl run model-run --image-pull-policy=Always --requests='cpu=500m,memory=2048Mi' --limits='cpu=750m,memory=2500Mi' --overrides='
{
    "apiVersion": "apps/v1beta1",
    "kind": "Deployment",
    "metadata": {
        "name": "model-run",
        "labels": {
            "app": "model-run"
        }
    },
    "spec": {
        "selector": {
            "matchLabels": {
                "app": "model-run"
            }
        },
        "template": {
            "metadata": {
                "labels": {
                    "app": "model-run"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "model-run",
                        "image": "gcr.io/some-project/news/model-run:development",
                    "imagePullPolicy": "Always",
                      "resouces": {
                        "requests": [
                          {
                            "memory": "2048Mi",
                            "cpu": "500m"
                          }
                        ],
                        "limits": [
                          {
                            "memory": "2500Mi",
                            "cpu": "750m"
                          }
                        ]
                      },
                        "volumeMounts": [
                            {
                                "name": "credentials",
                                "readOnly": true,
                                "mountPath":"/path/collection/keys"
                            }
                        ],
                        "env":[
                            {
                                "name":"GOOGLE_APPLICATION_CREDENTIALS",
                                "value":"/path/collection/keys/key.json"
                            }
                                ]
                    }
                ],
                "volumes": [
                    {
                        "name": "credentials",
                        "secret": {
                            "secretName": "credentials"
                        }
                    }
                ]
            }
        }
    }
}
' --image=gcr.io/some-project/news/model-run:development

The node was low on resource: memory. Container model-run was using 1904944Ki, which exceeds its request of 0.

一开始,消息似乎是 node 本身缺少资源,但第二部分让我相信您尝试提高容器的请求限制是正确的。

请记住,如果在此更改后您仍然遇到错误,您可能需要将模式 powerful node-pools 添加到您的集群。

我已经完成了你的命令,有几个问题我想强调一下:

  • kubectl run 在 1.12 中是 deprecated 除了 pods 之外的所有资源,它在版本 1.18 中被淘汰。
  • apiVersion": "apps/v1beta1deprecated,从 v 1.16 开始不再支持它,我替换为 apps/v1
  • spec.template.spec.container中写成"resouces"而不是"resources"
  • 修复资源后的下一个问题是requestslimits是写成array格式的,但是他们需要是list,否则你收到此错误:
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
error: v1beta1.Deployment.Spec: v1beta1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: ReadMapCB: expect { or n, but found [, error found in #10 byte of ...|"limits":[{"cpu":"75|..., bigger context ...|Always","name":"model-run","resources":{"limits":[{"cpu":"750m","memory":"2500Mi"}],"requests":[{"cp|...
  • 这里是你的命令的固定格式:
kubectl run model-run --image-pull-policy=Always --overrides='{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "model-run",
    "labels": {
      "app": "model-run"
    }
  },
  "spec": {
    "selector": {
      "matchLabels": {
        "app": "model-run"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "model-run"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "model-run",
            "image": "nginx",
            "imagePullPolicy": "Always",
            "resources": {
              "requests": {
                "memory": "2048Mi",
                "cpu": "500m"
              },
              "limits": {
                "memory": "2500Mi",
                "cpu": "750m"
              }
            },
            "volumeMounts": [
              {
                "name": "credentials",
                "readOnly": true,
                "mountPath": "/path/collection/keys"
              }
            ],
            "env": [
              {
                "name": "GOOGLE_APPLICATION_CREDENTIALS",
                "value": "/path/collection/keys/key.json"
              }
            ]
          }
        ],
        "volumes": [
          {
            "name": "credentials",
            "secret": {
              "secretName": "credentials"
            }
          }
        ]
      }
    }
  }
}'  --image=gcr.io/some-project/news/model-run:development
  • 现在在我的 Kubernetes Engine 集群 v1.15.11-gke.13 上应用它后,这里是 kubectl get pod X -o yaml 的输出:
$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
model-run-7bd8d79c7d-brmrw   1/1     Running   0          17s

$ kubectl get pod model-run-7bd8d79c7d-brmrw -o yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: model-run
    pod-template-hash: 7bd8d79c7d
    run: model-run
  name: model-run-7bd8d79c7d-brmrw
  namespace: default
spec:
  containers:
  - env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: /path/collection/keys/key.json
    image: nginx
    imagePullPolicy: Always
    name: model-run
    resources:
      limits:
        cpu: 750m
        memory: 2500Mi
      requests:
        cpu: 500m
        memory: 2Gi
    volumeMounts:
    - mountPath: /path/collection/keys
      name: credentials
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-tjn5t
      readOnly: true
  nodeName: gke-cluster-115-default-pool-abca4833-4jtx
  restartPolicy: Always
  volumes:
  - name: credentials
    secret:
      defaultMode: 420
      secretName: credentials

  • 您可以看到已设置资源限制和请求。

如果您还有任何问题,请在评论中告诉我!