pod 挂在 Pending 状态

pod hangs in Pending state

我有一个 kubernetes 部署,我试图在单个节点上的单个 pod 中 运行 5 docker 个容器。容器挂起在 "Pending" 状态并且永远不会被调度。我不介意 运行 多于 1 个 pod,但我想减少节点数。我假设 1 个节点和 1 个 CPU 和 1.7G RAM 足以容纳 5 个容器,并且我尝试将工作负载分散。

最初我得出的结论是资源不足。我启用了产生以下内容的节点的自动缩放(请参阅 kubectl describe pod 命令):

pod didn't trigger scale-up (it wouldn't fit if a new node is added)

无论如何,每个 docker 容器都有一个简单的命令,运行 是一个相当简单的应用程序。理想情况下,我不想处理资源的设置 CPU 和 RAM 分配,但即使在范围内设置 CPU/mem 限制,所以它们加起来不大于 1,我仍然得到(见kubectl describe po/test-529945953-gh6cl) 我明白了:

No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).

下面是显示状态的各种命令。对我做错的任何帮助将不胜感激。

kubectl get all

user_s@testing-11111:~/gce$ kubectl get all
NAME                          READY     STATUS    RESTARTS   AGE
po/test-529945953-gh6cl   0/5       Pending   0          34m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.7.240.1   <none>        443/TCP   19d

NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/test   1         1         1            0           34m

NAME                    DESIRED   CURRENT   READY     AGE
rs/test-529945953   1         1         0         34m
user_s@testing-11111:~/gce$

kubectl describe po/test-529945953-gh6cl

user_s@testing-11111:~/gce$ kubectl describe po/test-529945953-gh6cl
Name:           test-529945953-gh6cl
Namespace:      default
Node:           <none>
Labels:         app=test
                pod-template-hash=529945953
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"test-529945953","uid":"c6e889cb-a2a0-11e7-ac18-42010a9a001a"...
Status:         Pending
IP:
Created By:     ReplicaSet/test-529945953
Controlled By:  ReplicaSet/test-529945953
Containers:
  container-test2-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      test2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-kraken-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-gdax-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  container-bittrex-tickers:
    Image:      gcr.io/testing-11111/testology:latest
    Port:       <none>
    Command:
      process_cmd
      arg1
      arg2
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:      100m
      memory:   375Mi
    Environment:
      DB_HOST:          127.0.0.1:5432
      DB_PASSWORD:      <set to the key 'password' in secret 'cloudsql-db-credentials'> Optional: false
      DB_USER:          <set to the key 'username' in secret 'cloudsql-db-credentials'> Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
  cloudsql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.09
    Port:       <none>
    Command:
      /cloud_sql_proxy
      --dir=/cloudsql
      -instances=testing-11111:europe-west2:testology=tcp:5432
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:      150m
      memory:   375Mi
    Requests:
      cpu:              100m
      memory:           375Mi
    Environment:        <none>
    Mounts:
      /cloudsql from cloudsql (rw)
      /etc/ssl/certs from ssl-certs (rw)
      /secrets/cloudsql from cloudsql-instance-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-b2mxc (ro)
Conditions:
  Type          Status
  PodScheduled  False
Volumes:
  cloudsql-instance-credentials:
    Type:       Secret (a volume populated by a Secret)
    SecretName: cloudsql-instance-credentials
    Optional:   false
  ssl-certs:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/ssl/certs
  cloudsql:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-b2mxc:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-b2mxc
    Optional:   false
QoS Class:      Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady:NoExecute for 300s
                node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
  FirstSeen     LastSeen        Count   From                    SubObjectPath   Type            Reason                  Message
  ---------     --------        -----   ----                    -------------   --------        ------                  -------
  27m           17m             44      default-scheduler                       Warning         FailedScheduling        No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2).
  26m           8s              150     cluster-autoscaler                      Normal          NotTriggerScaleUp       pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  16m           2s              63      default-scheduler                       Warning         FailedScheduling        No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (1).
user_s@testing-11111:~/gce$

> Blockquote

kubectl get nodes

user_s@testing-11111:~/gce$ kubectl get nodes
NAME                                      STATUS    AGE       VERSION
gke-test-default-pool-abdf83f7-p4zw   Ready     9h        v1.6.7

kubectl get pods

user_s@testing-11111:~/gce$ kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
test-529945953-gh6cl   0/5       Pending   0          38m

kubectl describe nodes

user_s@testing-11111:~/gce$ kubectl describe nodes
Name:                   gke-test-default-pool-abdf83f7-p4zw
Role:
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/fluentd-ds-ready=true
                        beta.kubernetes.io/instance-type=g1-small
                        beta.kubernetes.io/os=linux
                        cloud.google.com/gke-nodepool=default-pool
                        failure-domain.beta.kubernetes.io/region=europe-west2
                        failure-domain.beta.kubernetes.io/zone=europe-west2-c
                        kubernetes.io/hostname=gke-test-default-pool-abdf83f7-p4zw
Annotations:            node.alpha.kubernetes.io/ttl=0
                        volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:                 <none>
CreationTimestamp:      Tue, 26 Sep 2017 02:05:45 +0100
Conditions:
  Type                  Status  LastHeartbeatTime                       LastTransitionTime                      Reason                          Message
  ----                  ------  -----------------                       ------------------                      ------                          -------
  NetworkUnavailable    False   Tue, 26 Sep 2017 02:06:05 +0100         Tue, 26 Sep 2017 02:06:05 +0100         RouteCreated                    RouteController created a route
  OutOfDisk             False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasSufficientDisk        kubelet has sufficient disk space available
  MemoryPressure        False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasSufficientMemory      kubelet has sufficient memory available
  DiskPressure          False   Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KubeletHasNoDiskPressure        kubelet has no disk pressure
  Ready                 True    Tue, 26 Sep 2017 11:33:57 +0100         Tue, 26 Sep 2017 02:06:05 +0100         KubeletReady                    kubelet is posting ready status. AppArmor enabled
  KernelDeadlock        False   Tue, 26 Sep 2017 11:33:12 +0100         Tue, 26 Sep 2017 02:05:45 +0100         KernelHasNoDeadlock             kernel has no deadlock
Addresses:
  InternalIP:   10.154.0.2
  ExternalIP:   35.197.217.1
  Hostname:     gke-test-default-pool-abdf83f7-p4zw
Capacity:
 cpu:           1
 memory:        1742968Ki
 pods:          110
Allocatable:
 cpu:           1
 memory:        1742968Ki
 pods:          110
System Info:
 Machine ID:                    e6119abf844c564193495c64fd9bd341
 System UUID:                   E6119ABF-844C-5641-9349-5C64FD9BD341
 Boot ID:                       1c2f2ea0-1f5b-4c90-9e14-d1d9d7b75221
 Kernel Version:                4.4.52+
 OS Image:                      Container-Optimized OS from Google
 Operating System:              linux
 Architecture:                  amd64
 Container Runtime Version:     docker://1.11.2
 Kubelet Version:               v1.6.7
 Kube-Proxy Version:            v1.6.7
PodCIDR:                        10.4.1.0/24
ExternalID:                     6073438913956157854
Non-terminated Pods:            (7 in total)
  Namespace                     Name                                                            CPU Requests    CPU Limits      Memory Requests Memory Limits
  ---------                     ----                                                            ------------    ----------      --------------- -------------
  kube-system                   fluentd-gcp-v2.0-k565g                                          100m (10%)      0 (0%)          200Mi (11%)     300Mi (17%)
  kube-system                   heapster-v1.3.0-3440173064-1ztvw                                138m (13%)      138m (13%)      301456Ki (17%)  301456Ki (17%)
  kube-system                   kube-dns-1829567597-gdz52                                       260m (26%)      0 (0%)          110Mi (6%)      170Mi (9%)
  kube-system                   kube-dns-autoscaler-2501648610-7q9dd                            20m (2%)        0 (0%)          10Mi (0%)       0 (0%)
  kube-system                   kube-proxy-gke-test-default-pool-abdf83f7-p4zw              100m (10%)      0 (0%)          0 (0%)          0 (0%)
  kube-system                   kubernetes-dashboard-490794276-25hmn                            100m (10%)      100m (10%)      50Mi (2%)       50Mi (2%)
  kube-system                   l7-default-backend-3574702981-flqck                             10m (1%)        10m (1%)        20Mi (1%)       20Mi (1%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits      Memory Requests Memory Limits
  ------------  ----------      --------------- -------------
  728m (72%)    248m (24%)      700816Ki (40%)  854416Ki (49%)
Events:         <none>

正如您在 Allocated resources: 下的 kubectl describe nodes 命令的输出中所看到的,728m (72%) CPU 和 700816Ki (40%) 内存已经被 Pods 运行ning 在节点上的 kube-system 命名空间中。正如您在 kubectl describe po/[…] 命令的 Events 下看到的,您的测试 Pod 的资源请求总和超过了节点上剩余的 CPU 和可用内存。

如果要将所有容器保留在一个 pod 中,则需要减少容器的资源请求或 运行 将它们放在具有更多 CPU 和内存的节点上。更好的解决方案是将您的应用程序拆分为多个 pods,这样可以分布在多个节点上。