Pods 通过 EKS 上的 Helm 等待节点导出器
Pods Pending for Node-Exporter via Helm on EKS
出于故障排除的目的,我决定通过 helm install exporter stable/prometheus
部署一个非常普通的 Prometheus NodeExporter 实现,但是我无法启动 pods。我搜索了高低,我不知道还能去哪里。除了这个以外,我可以在我的集群上安装许多其他应用程序。我附上了一些故障排除输出供您参考。我相信这可能与 "tolerations" 有关,但我仍在深入研究。
EKS 集群 运行 on 3 t2.large 每个节点最多可以支持 35 pods,我 运行 一共 43 pods.任何其他故障排除的想法将不胜感激。
描述Pods输出
✗ kubectl get pods
NAME READY STATUS RESTARTS AGE
exporter-prometheus-node-exporter-bcwc4 0/1 Pending 0 15m
exporter-prometheus-node-exporter-kr7z7 0/1 Pending 0 15m
exporter-prometheus-node-exporter-lw87g 0/1 Pending 0 15m
描述Pods
Name: exporter-prometheus-node-exporter-bcwc4
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-11.1.2
component=node-exporter
controller-revision-hash=668b4894bb
heritage=Helm
pod-template-generation=1
release=exporter
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/exporter-prometheus-node-exporter
Containers:
prometheus-node-exporter:
Image: prom/node-exporter:v0.18.1
Port: 9100/TCP
Host Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
Environment: <none>
Mounts:
/host/proc from proc (ro)
/host/sys from sys (ro)
/var/run/secrets/kubernetes.io/serviceaccount from exporter-prometheus-node-exporter-token-rl4fm (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
exporter-prometheus-node-exporter-token-rl4fm:
Type: Secret (a volume populated by a Secret)
SecretName: exporter-prometheus-node-exporter-token-rl4fm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2s (x24 over 29m) default-scheduler 0/3 nodes are available: 2 node(s) didn't match node selector, 3 node(s) didn't
have free ports for the requested pod ports.
Daemoneset 配置
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: "2020-05-12T06:15:30Z"
generation: 1
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
name: exporter-prometheus-node-exporter
namespace: monitoring
resourceVersion: "8131959"
selfLink: /apis/extensions/v1beta1/namespaces/monitoring/daemonsets/exporter-prometheus-node-exporter
uid: 5ede0739-cd05-4e3b-ace1-87fafb33314a
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: node-exporter
release: exporter
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
spec:
containers:
- args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
image: prom/node-exporter:v0.18.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: true
- mountPath: /host/sys
name: sys
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: exporter-prometheus-node-exporter
serviceAccountName: exporter-prometheus-node-exporter
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /proc
type: ""
name: proc
- hostPath:
path: /sys
type: ""
name: sys
templateGeneration: 1
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberMisscheduled: 0
numberReady: 0
numberUnavailable: 3
observedGeneration: 1
updatedNumberScheduled: 3
3 node(s) didn't have free ports for the requested pod ports.
从错误中可以看出分配的节点端口已被使用。正如您定义的 hostPort: 9100
,它限制了 pod 可以调度的位置数,因为每个 <hostIP, hostPort, protocol>
组合必须是唯一的。参考:https://kubernetes.io/docs/concepts/configuration/overview/#services
出于故障排除的目的,我决定通过 helm install exporter stable/prometheus
部署一个非常普通的 Prometheus NodeExporter 实现,但是我无法启动 pods。我搜索了高低,我不知道还能去哪里。除了这个以外,我可以在我的集群上安装许多其他应用程序。我附上了一些故障排除输出供您参考。我相信这可能与 "tolerations" 有关,但我仍在深入研究。
EKS 集群 运行 on 3 t2.large 每个节点最多可以支持 35 pods,我 运行 一共 43 pods.任何其他故障排除的想法将不胜感激。
描述Pods输出
✗ kubectl get pods
NAME READY STATUS RESTARTS AGE
exporter-prometheus-node-exporter-bcwc4 0/1 Pending 0 15m
exporter-prometheus-node-exporter-kr7z7 0/1 Pending 0 15m
exporter-prometheus-node-exporter-lw87g 0/1 Pending 0 15m
描述Pods
Name: exporter-prometheus-node-exporter-bcwc4
Namespace: monitoring
Priority: 0
Node: <none>
Labels: app=prometheus
chart=prometheus-11.1.2
component=node-exporter
controller-revision-hash=668b4894bb
heritage=Helm
pod-template-generation=1
release=exporter
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: DaemonSet/exporter-prometheus-node-exporter
Containers:
prometheus-node-exporter:
Image: prom/node-exporter:v0.18.1
Port: 9100/TCP
Host Port: 9100/TCP
Args:
--path.procfs=/host/proc
--path.sysfs=/host/sys
Environment: <none>
Mounts:
/host/proc from proc (ro)
/host/sys from sys (ro)
/var/run/secrets/kubernetes.io/serviceaccount from exporter-prometheus-node-exporter-token-rl4fm (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType:
sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
exporter-prometheus-node-exporter-token-rl4fm:
Type: Secret (a volume populated by a Secret)
SecretName: exporter-prometheus-node-exporter-token-rl4fm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2s (x24 over 29m) default-scheduler 0/3 nodes are available: 2 node(s) didn't match node selector, 3 node(s) didn't
have free ports for the requested pod ports.
Daemoneset 配置
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
creationTimestamp: "2020-05-12T06:15:30Z"
generation: 1
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
name: exporter-prometheus-node-exporter
namespace: monitoring
resourceVersion: "8131959"
selfLink: /apis/extensions/v1beta1/namespaces/monitoring/daemonsets/exporter-prometheus-node-exporter
uid: 5ede0739-cd05-4e3b-ace1-87fafb33314a
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
component: node-exporter
release: exporter
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
chart: prometheus-11.1.2
component: node-exporter
heritage: Helm
release: exporter
spec:
containers:
- args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
image: prom/node-exporter:v0.18.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
name: metrics
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /host/proc
name: proc
readOnly: true
- mountPath: /host/sys
name: sys
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
hostPID: true
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: exporter-prometheus-node-exporter
serviceAccountName: exporter-prometheus-node-exporter
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /proc
type: ""
name: proc
- hostPath:
path: /sys
type: ""
name: sys
templateGeneration: 1
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 3
desiredNumberScheduled: 3
numberMisscheduled: 0
numberReady: 0
numberUnavailable: 3
observedGeneration: 1
updatedNumberScheduled: 3
3 node(s) didn't have free ports for the requested pod ports.
从错误中可以看出分配的节点端口已被使用。正如您定义的 hostPort: 9100
,它限制了 pod 可以调度的位置数,因为每个 <hostIP, hostPort, protocol>
组合必须是唯一的。参考:https://kubernetes.io/docs/concepts/configuration/overview/#services