Kubernetes:运行 Pods 仅在具有 GPU 的 EC2 节点上

Kubernetes: Run Pods only on EC2 Nodes that have GPUs

我正在使用 DaemonSet 和 NVIDIA DCGM 在集群上设置 GPU 监控。显然,只有监视具有 GPU 的节点才有意义。

我正在尝试使用 nodeSelector 来达到这个目的,但是 the documentation states that:

For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.

我打算检查标签 beta.kubernetes.io/instance-type 是否是以下任何一个:

[p3.2xlarge, p3.8xlarge, p3.16xlarge, p2.xlarge, p2.8xlarge, p2.16xlarge, g3.4xlarge, g3.8xlarge, g3.16xlarge]

但是我不知道如何在使用 nodeSelector 时建立 or 关系?

Node Affinity 是解决方案:

spec:
  template:
    metadata:
      labels:
        app: dcgm-exporter
      annotations:
        prometheus.io/scrape: 'true'
        description: |
          This `DaemonSet` provides GPU metrics in Prometheus format.
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/instance-type
                operator: In
                values:
                - p2.xlarge
                - p2.8xlarge
                - p2.16xlarge
                - p3.2xlarge
                - p3.8xlarge
                - p3.16xlarge
                - g3.4xlarge
                - g3.8xlarge
                - g3.16xlarge