k8s cron 作业运行多次
k8s cron job runs multi times
我有以下 cronjob,它删除特定命名空间中的 pods。
我按原样 运行 工作,但似乎工作不是每 20 分钟 运行,而是每隔几 (2-3) 分钟 运行 ,
我需要的是每 20 分钟作业将开始删除指定命名空间中的 pods 然后终止,知道这里可能有什么问题吗?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: sa
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name | while read -r POD; do kubectl delete "$POD"; sleep 30; done
我真的不知道为什么会这样...
可能删除pod崩溃了
更新
我尝试了以下但没有 pods 被删除,知道吗?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: pod-exterminator
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name --selector name!=restart | while read -r POD; do kubectl delete "$POD"; sleep 10; done.
此 cronjob pod 将在执行过程中的某个时刻自行删除。导致作业失败并另外重置其 back-off 计数。
docs 说:
The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
您需要应用适当的过滤器。另请注意,您可以使用一条命令删除所有 pods。
向 spec.jobTemplate.spec.template.metadata
添加一个可用于过滤的标签。
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
jobTemplate:
spec:
template:
metadata:
labels:
name: restart # label the pod
然后使用这个标签删除所有 pods 不是 cronjob pod 的。
kubectl delete pod --selector name!=restart
由于您在评论中声明,您需要一个循环,完整的工作示例可能如下所示。
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
namespace: sandbox
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: restart
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- |
kubectl get pods -o name --selector "name!=restart" |
while read -r POD; do
kubectl delete "$POD"
sleep 30
done
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: restart
namespace: sandbox
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-management
namespace: sandbox
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: restart-pod-management
namespace: sandbox
subjects:
- kind: ServiceAccount
name: restart
namespace: sandbox
roleRef:
kind: Role
name: pod-management
apiGroup: rbac.authorization.k8s.io
kubectl create namespace sandbox
kubectl config set-context --current --namespace sandbox
kubectl run pod1 --image busybox -- sleep infinity
kubectl run pod2 --image busybox -- sleep infinity
kubectl apply -f restart.yaml # the above file
在这里您可以看到第一个 pod 是如何终止的。
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/pod1 1/1 Terminating 0 43s
pod/pod2 1/1 Running 0 39s
pod/restart-27432801-rrtvm 1/1 Running 0 16s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/restart */1 * * * * False 1 17s 36s
NAME COMPLETIONS DURATION AGE
job.batch/restart-27432801 0/1 17s 17s
请注意,这实际上有点错误。因为从您读取 pod 列表到删除列表中的单个 pod,该 pod 可能不再存在。您可以使用下面的方法忽略这些情况,因为当它们消失时您不需要删除它们。
kubectl delete "$POD" || true
也就是说,由于您将作业命名为重新启动,我假设这样做的目的是重新启动某些部署的 pods。您实际上可以使用适当的重启,利用 Kubernetes update strategies.
kubectl rollout restart $(kubectl get deploy -o name)
使用默认更新策略,这将导致首先创建新的 pods 并确保它们在终止旧的之前准备就绪。
$ kubectl rollout restart $(kubectl get deploy -o name)
NAME READY STATUS RESTARTS AGE
pod/app1-56f87fc665-mf9th 0/1 ContainerCreating 0 2s
pod/app1-5cbc776547-fh96w 1/1 Running 0 2m9s
pod/app2-7b9779f767-48kpd 0/1 ContainerCreating 0 2s
pod/app2-8d6454757-xj4zc 1/1 Running 0 2m9s
这也适用于 deamonsets。
$ kubectl rollout restart -h
Restart a resource.
Resource rollout will be restarted.
Examples:
# Restart a deployment
kubectl rollout restart deployment/nginx
# Restart a daemon set
kubectl rollout restart daemonset/abc
我有以下 cronjob,它删除特定命名空间中的 pods。
我按原样 运行 工作,但似乎工作不是每 20 分钟 运行,而是每隔几 (2-3) 分钟 运行 , 我需要的是每 20 分钟作业将开始删除指定命名空间中的 pods 然后终止,知道这里可能有什么问题吗?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: sa
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name | while read -r POD; do kubectl delete "$POD"; sleep 30; done
我真的不知道为什么会这样...
可能删除pod崩溃了
更新
我尝试了以下但没有 pods 被删除,知道吗?
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: pod-exterminator
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- kubectl get pods -o name --selector name!=restart | while read -r POD; do kubectl delete "$POD"; sleep 10; done.
此 cronjob pod 将在执行过程中的某个时刻自行删除。导致作业失败并另外重置其 back-off 计数。
docs 说:
The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
您需要应用适当的过滤器。另请注意,您可以使用一条命令删除所有 pods。
向 spec.jobTemplate.spec.template.metadata
添加一个可用于过滤的标签。
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
spec:
jobTemplate:
spec:
template:
metadata:
labels:
name: restart # label the pod
然后使用这个标签删除所有 pods 不是 cronjob pod 的。
kubectl delete pod --selector name!=restart
由于您在评论中声明,您需要一个循环,完整的工作示例可能如下所示。
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: restart
namespace: sandbox
spec:
schedule: "*/20 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
backoffLimit: 0
template:
metadata:
labels:
name: restart
spec:
serviceAccountName: restart
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl:1.22.3
command:
- /bin/sh
- -c
- |
kubectl get pods -o name --selector "name!=restart" |
while read -r POD; do
kubectl delete "$POD"
sleep 30
done
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: restart
namespace: sandbox
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-management
namespace: sandbox
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: restart-pod-management
namespace: sandbox
subjects:
- kind: ServiceAccount
name: restart
namespace: sandbox
roleRef:
kind: Role
name: pod-management
apiGroup: rbac.authorization.k8s.io
kubectl create namespace sandbox
kubectl config set-context --current --namespace sandbox
kubectl run pod1 --image busybox -- sleep infinity
kubectl run pod2 --image busybox -- sleep infinity
kubectl apply -f restart.yaml # the above file
在这里您可以看到第一个 pod 是如何终止的。
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/pod1 1/1 Terminating 0 43s
pod/pod2 1/1 Running 0 39s
pod/restart-27432801-rrtvm 1/1 Running 0 16s
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/restart */1 * * * * False 1 17s 36s
NAME COMPLETIONS DURATION AGE
job.batch/restart-27432801 0/1 17s 17s
请注意,这实际上有点错误。因为从您读取 pod 列表到删除列表中的单个 pod,该 pod 可能不再存在。您可以使用下面的方法忽略这些情况,因为当它们消失时您不需要删除它们。
kubectl delete "$POD" || true
也就是说,由于您将作业命名为重新启动,我假设这样做的目的是重新启动某些部署的 pods。您实际上可以使用适当的重启,利用 Kubernetes update strategies.
kubectl rollout restart $(kubectl get deploy -o name)
使用默认更新策略,这将导致首先创建新的 pods 并确保它们在终止旧的之前准备就绪。
$ kubectl rollout restart $(kubectl get deploy -o name)
NAME READY STATUS RESTARTS AGE
pod/app1-56f87fc665-mf9th 0/1 ContainerCreating 0 2s
pod/app1-5cbc776547-fh96w 1/1 Running 0 2m9s
pod/app2-7b9779f767-48kpd 0/1 ContainerCreating 0 2s
pod/app2-8d6454757-xj4zc 1/1 Running 0 2m9s
这也适用于 deamonsets。
$ kubectl rollout restart -h
Restart a resource.
Resource rollout will be restarted.
Examples:
# Restart a deployment
kubectl rollout restart deployment/nginx
# Restart a daemon set
kubectl rollout restart daemonset/abc