RabbitMQ pod 意外崩溃
RabbitMQ pod is crashing unexpectedly
我有一个 pod 运行 RabbitMQ。以下是部署清单:
apiVersion: v1
kind: Service
metadata:
name: service-rabbitmq
spec:
selector:
app: service-rabbitmq
ports:
- port: 5672
targetPort: 5672
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-rabbitmq
spec:
selector:
matchLabels:
app: deployment-rabbitmq
template:
metadata:
labels:
app: deployment-rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:latest
volumeMounts:
- name: rabbitmq-data-volume
mountPath: /var/lib/rabbitmq
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 750m
memory: 256Mi
volumes:
- name: rabbitmq-data-volume
persistentVolumeClaim:
claimName: rabbitmq-pvc
当我在我的本地集群中部署它时,我看到 pod 运行 一段时间然后崩溃。所以基本上它处于崩溃循环之下。以下是我从 pod 获得的日志:
$ kubectl logs deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:36.182390+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 06:46:36.221717+00:00 [info] <0.222.0> Feature flags: [ ] implicit_default_bindings
2021-10-14 06:46:36.221768+00:00 [info] <0.222.0> Feature flags: [ ] maintenance_mode_status
2021-10-14 06:46:36.221792+00:00 [info] <0.222.0> Feature flags: [ ] quorum_queue
2021-10-14 06:46:36.221813+00:00 [info] <0.222.0> Feature flags: [ ] stream_queue
2021-10-14 06:46:36.221916+00:00 [info] <0.222.0> Feature flags: [ ] user_limits
2021-10-14 06:46:36.221933+00:00 [info] <0.222.0> Feature flags: [ ] virtual_host_metadata
2021-10-14 06:46:36.221953+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 06:46:37.018537+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 06:46:37.018646+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 06:46:37.045601+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 06:46:37.635024+00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 06:46:37.635139+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@deployment-rabbitmq-649b8479dc-kt9s4/quorum/rabbit@deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:37.849041+00:00 [info] <0.259.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 06:46:37.877504+00:00 [noti] <0.264.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
这个日志没有太大帮助,我从这里找不到任何错误消息。此处唯一有用的行可能是 Application syslog exited with reason: stopped
,但据我所知还不够。事件日志也没有帮助:
$ kubectl describe pods deployment-rabbitmq-649b8479dc-kt9s4
Name: deployment-rabbitmq-649b8479dc-kt9s4
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Thu, 14 Oct 2021 12:45:03 +0600
Labels: app=deployment-rabbitmq
pod-template-hash=649b8479dc
skaffold.dev/run-id=7af5e1bb-e0c8-4021-a8a0-0c8bf43630b6
Annotations: <none>
Status: Running
IP: 10.1.5.138
IPs:
IP: 10.1.5.138
Controlled By: ReplicaSet/deployment-rabbitmq-649b8479dc
Containers:
rabbitmq:
Container ID: docker://de309f94163c071afb38fb8743d106923b6bda27325287e82bc274e362f1f3be
Image: rabbitmq:latest
Image ID: docker-pullable://rabbitmq@sha256:d8efe7b818e66a13fdc6fdb84cf527984fb7d73f52466833a20e9ec298ed4df4
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Thu, 14 Oct 2021 13:56:29 +0600
Finished: Thu, 14 Oct 2021 13:56:39 +0600
Ready: False
Restart Count: 18
Limits:
cpu: 750m
memory: 256Mi
Requests:
cpu: 250m
memory: 128Mi
Environment: <none>
Mounts:
/var/lib/rabbitmq from rabbitmq-data-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9shdv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq-data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-pvc
ReadOnly: false
kube-api-access-9shdv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 23m (x6 over 50m) kubelet (combined from similar events): Successfully pulled image "rabbitmq:latest" in 4.267310231s
Normal Pulling 18m (x16 over 73m) kubelet Pulling image "rabbitmq:latest"
Warning BackOff 3m45s (x307 over 73m) kubelet Back-off restarting failed container
此崩溃循环的原因可能是什么?
NOTE: rabbitmq-pvc
is successfully bound. No issue there.
更新:
表示RabbitMQ应该部署为StatefulSet。所以我像这样调整清单:
apiVersion: v1
kind: Service
metadata:
name: service-rabbitmq
spec:
selector:
app: service-rabbitmq
ports:
- name: rabbitmq-amqp
port: 5672
- name: rabbitmq-http
port: 15672
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: statefulset-rabbitmq
spec:
selector:
matchLabels:
app: statefulset-rabbitmq
serviceName: service-rabbitmq
template:
metadata:
labels:
app: statefulset-rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:latest
volumeMounts:
- name: rabbitmq-data-volume
mountPath: /var/lib/rabbitmq/mnesia
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 750m
memory: 256Mi
volumes:
- name: rabbitmq-data-volume
persistentVolumeClaim:
claimName: rabbitmq-pvc
pod 仍处于崩溃循环,但日志略有不同。
$ kubectl logs statefulset-rabbitmq-0
2021-10-14 09:38:26.138224+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 09:38:26.158953+00:00 [info] <0.222.0> Feature flags: [x] implicit_default_bindings
2021-10-14 09:38:26.159015+00:00 [info] <0.222.0> Feature flags: [x] maintenance_mode_status
2021-10-14 09:38:26.159037+00:00 [info] <0.222.0> Feature flags: [x] quorum_queue
2021-10-14 09:38:26.159078+00:00 [info] <0.222.0> Feature flags: [x] stream_queue
2021-10-14 09:38:26.159183+00:00 [info] <0.222.0> Feature flags: [x] user_limits
2021-10-14 09:38:26.159236+00:00 [info] <0.222.0> Feature flags: [x] virtual_host_metadata
2021-10-14 09:38:26.159270+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 09:38:26.830814+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 09:38:26.830925+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 09:38:26.852048+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 09:38:33.754355+00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 09:38:33.754526+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@statefulset-rabbitmq-0/quorum/rabbit@statefulset-rabbitmq-0
2021-10-14 09:38:33.760365+00:00 [info] <0.290.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 09:38:33.761023+00:00 [noti] <0.302.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
功能标志现在已按原样标记。没有其他显着变化。所以我仍然需要帮助。
!新一期!
头顶 。
pod 被 oomkilled(最后状态、原因),您需要为 pod 分配更多资源(内存)。
我有一个 pod 运行 RabbitMQ。以下是部署清单:
apiVersion: v1
kind: Service
metadata:
name: service-rabbitmq
spec:
selector:
app: service-rabbitmq
ports:
- port: 5672
targetPort: 5672
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-rabbitmq
spec:
selector:
matchLabels:
app: deployment-rabbitmq
template:
metadata:
labels:
app: deployment-rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:latest
volumeMounts:
- name: rabbitmq-data-volume
mountPath: /var/lib/rabbitmq
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 750m
memory: 256Mi
volumes:
- name: rabbitmq-data-volume
persistentVolumeClaim:
claimName: rabbitmq-pvc
当我在我的本地集群中部署它时,我看到 pod 运行 一段时间然后崩溃。所以基本上它处于崩溃循环之下。以下是我从 pod 获得的日志:
$ kubectl logs deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:36.182390+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 06:46:36.221717+00:00 [info] <0.222.0> Feature flags: [ ] implicit_default_bindings
2021-10-14 06:46:36.221768+00:00 [info] <0.222.0> Feature flags: [ ] maintenance_mode_status
2021-10-14 06:46:36.221792+00:00 [info] <0.222.0> Feature flags: [ ] quorum_queue
2021-10-14 06:46:36.221813+00:00 [info] <0.222.0> Feature flags: [ ] stream_queue
2021-10-14 06:46:36.221916+00:00 [info] <0.222.0> Feature flags: [ ] user_limits
2021-10-14 06:46:36.221933+00:00 [info] <0.222.0> Feature flags: [ ] virtual_host_metadata
2021-10-14 06:46:36.221953+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 06:46:37.018537+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 06:46:37.018646+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 06:46:37.045601+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 06:46:37.635024+00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 06:46:37.635139+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@deployment-rabbitmq-649b8479dc-kt9s4/quorum/rabbit@deployment-rabbitmq-649b8479dc-kt9s4
2021-10-14 06:46:37.849041+00:00 [info] <0.259.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 06:46:37.877504+00:00 [noti] <0.264.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
这个日志没有太大帮助,我从这里找不到任何错误消息。此处唯一有用的行可能是 Application syslog exited with reason: stopped
,但据我所知还不够。事件日志也没有帮助:
$ kubectl describe pods deployment-rabbitmq-649b8479dc-kt9s4
Name: deployment-rabbitmq-649b8479dc-kt9s4
Namespace: default
Priority: 0
Node: docker-desktop/192.168.65.4
Start Time: Thu, 14 Oct 2021 12:45:03 +0600
Labels: app=deployment-rabbitmq
pod-template-hash=649b8479dc
skaffold.dev/run-id=7af5e1bb-e0c8-4021-a8a0-0c8bf43630b6
Annotations: <none>
Status: Running
IP: 10.1.5.138
IPs:
IP: 10.1.5.138
Controlled By: ReplicaSet/deployment-rabbitmq-649b8479dc
Containers:
rabbitmq:
Container ID: docker://de309f94163c071afb38fb8743d106923b6bda27325287e82bc274e362f1f3be
Image: rabbitmq:latest
Image ID: docker-pullable://rabbitmq@sha256:d8efe7b818e66a13fdc6fdb84cf527984fb7d73f52466833a20e9ec298ed4df4
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Thu, 14 Oct 2021 13:56:29 +0600
Finished: Thu, 14 Oct 2021 13:56:39 +0600
Ready: False
Restart Count: 18
Limits:
cpu: 750m
memory: 256Mi
Requests:
cpu: 250m
memory: 128Mi
Environment: <none>
Mounts:
/var/lib/rabbitmq from rabbitmq-data-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9shdv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rabbitmq-data-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: rabbitmq-pvc
ReadOnly: false
kube-api-access-9shdv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 23m (x6 over 50m) kubelet (combined from similar events): Successfully pulled image "rabbitmq:latest" in 4.267310231s
Normal Pulling 18m (x16 over 73m) kubelet Pulling image "rabbitmq:latest"
Warning BackOff 3m45s (x307 over 73m) kubelet Back-off restarting failed container
此崩溃循环的原因可能是什么?
NOTE:
rabbitmq-pvc
is successfully bound. No issue there.
更新:
apiVersion: v1
kind: Service
metadata:
name: service-rabbitmq
spec:
selector:
app: service-rabbitmq
ports:
- name: rabbitmq-amqp
port: 5672
- name: rabbitmq-http
port: 15672
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: statefulset-rabbitmq
spec:
selector:
matchLabels:
app: statefulset-rabbitmq
serviceName: service-rabbitmq
template:
metadata:
labels:
app: statefulset-rabbitmq
spec:
containers:
- name: rabbitmq
image: rabbitmq:latest
volumeMounts:
- name: rabbitmq-data-volume
mountPath: /var/lib/rabbitmq/mnesia
resources:
requests:
cpu: 250m
memory: 128Mi
limits:
cpu: 750m
memory: 256Mi
volumes:
- name: rabbitmq-data-volume
persistentVolumeClaim:
claimName: rabbitmq-pvc
pod 仍处于崩溃循环,但日志略有不同。
$ kubectl logs statefulset-rabbitmq-0
2021-10-14 09:38:26.138224+00:00 [info] <0.222.0> Feature flags: list of feature flags found:
2021-10-14 09:38:26.158953+00:00 [info] <0.222.0> Feature flags: [x] implicit_default_bindings
2021-10-14 09:38:26.159015+00:00 [info] <0.222.0> Feature flags: [x] maintenance_mode_status
2021-10-14 09:38:26.159037+00:00 [info] <0.222.0> Feature flags: [x] quorum_queue
2021-10-14 09:38:26.159078+00:00 [info] <0.222.0> Feature flags: [x] stream_queue
2021-10-14 09:38:26.159183+00:00 [info] <0.222.0> Feature flags: [x] user_limits
2021-10-14 09:38:26.159236+00:00 [info] <0.222.0> Feature flags: [x] virtual_host_metadata
2021-10-14 09:38:26.159270+00:00 [info] <0.222.0> Feature flags: feature flag states written to disk: yes
2021-10-14 09:38:26.830814+00:00 [noti] <0.44.0> Application syslog exited with reason: stopped
2021-10-14 09:38:26.830925+00:00 [noti] <0.222.0> Logging: switching to configured handler(s); following messages may not be visible in this log output
2021-10-14 09:38:26.852048+00:00 [noti] <0.222.0> Logging: configured log handlers are now ACTIVE
2021-10-14 09:38:33.754355+00:00 [info] <0.222.0> ra: starting system quorum_queues
2021-10-14 09:38:33.754526+00:00 [info] <0.222.0> starting Ra system: quorum_queues in directory: /var/lib/rabbitmq/mnesia/rabbit@statefulset-rabbitmq-0/quorum/rabbit@statefulset-rabbitmq-0
2021-10-14 09:38:33.760365+00:00 [info] <0.290.0> ra: meta data store initialised for system quorum_queues. 0 record(s) recovered
2021-10-14 09:38:33.761023+00:00 [noti] <0.302.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
功能标志现在已按原样标记。没有其他显着变化。所以我仍然需要帮助。
!新一期!
头顶
pod 被 oomkilled(最后状态、原因),您需要为 pod 分配更多资源(内存)。