k8s pod 就绪探测因连接被拒绝而失败,但 pod 可以很好地处理请求
k8s pod readiness probe fails with connection refused, but pod is serving requests just fine
我很难理解为什么 pods 就绪探测失败了。
Warning Unhealthy 21m (x2 over 21m) kubelet, REDACTED Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused
如果我执行此 pod(或实际上执行该应用程序的任何其他 pod),我可以 运行 对那个非常 URL 卷曲而不会出现问题:
kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
* Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
<
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}
我的 Mac 上的 Docker-for-Desktop k8s 集群和 OpenShift 集群都出现了这种行为。
就绪探针在 kubectl describe 中显示如下:
Readiness: http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10
helm chart 有这样的配置:
readinessProbe:
failureThreshold: 10
httpGet:
path: /actuator/health
port: 8081
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
我不能完全排除 HTTP 代理设置是罪魁祸首,但 k8s 文档说 HTTP_PROXY 自 v1.13 以来检查被忽略,所以它不应该在本地发生。
OpenShift k8s版本是1.11,我本地是1.16
描述事件始终显示您正在检查的资源上的最后一个事件。问题是最后记录的事件在检查 readinessProbe
时是一个错误。
我在我的实验室使用以下 pod 清单对其进行了测试:
apiVersion: v1
kind: Pod
metadata:
name: readiness-exec
spec:
containers:
- name: readiness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- sleep 30; touch /tmp/healthy; sleep 600
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
可以看出,30 秒后将在 pod 中创建一个文件 /tmp/healthy
,readinessProbe
将在 5 秒后检查文件是否存在,并在每 5 秒后重复检查一次.
描述这个 pod 会给我:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m56s default-scheduler Successfully assigned default/readiness-exec to yaki-118-2
Normal Pulling 7m55s kubelet, yaki-118-2 Pulling image "k8s.gcr.io/busybox"
Normal Pulled 7m55s kubelet, yaki-118-2 Successfully pulled image "k8s.gcr.io/busybox"
Normal Created 7m55s kubelet, yaki-118-2 Created container readiness
Normal Started 7m55s kubelet, yaki-118-2 Started container readiness
Warning Unhealthy 7m25s (x6 over 7m50s) kubelet, yaki-118-2 Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory
readinessProbe
查找文件 6 次都没有成功,完全正确,因为我将其配置为每 5 秒检查一次,文件在 30 秒后创建。
您认为的问题实际上是预期的行为。您的事件告诉您 readinessProbe
在 21 分钟前检查失败。这实际上意味着您的 pod 从 21 分钟前开始就健康了。
我很难理解为什么 pods 就绪探测失败了。
Warning Unhealthy 21m (x2 over 21m) kubelet, REDACTED Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused
如果我执行此 pod(或实际上执行该应用程序的任何其他 pod),我可以 运行 对那个非常 URL 卷曲而不会出现问题:
kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
* Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
<
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}
我的 Mac 上的 Docker-for-Desktop k8s 集群和 OpenShift 集群都出现了这种行为。
就绪探针在 kubectl describe 中显示如下:
Readiness: http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10
helm chart 有这样的配置:
readinessProbe:
failureThreshold: 10
httpGet:
path: /actuator/health
port: 8081
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
我不能完全排除 HTTP 代理设置是罪魁祸首,但 k8s 文档说 HTTP_PROXY 自 v1.13 以来检查被忽略,所以它不应该在本地发生。
OpenShift k8s版本是1.11,我本地是1.16
描述事件始终显示您正在检查的资源上的最后一个事件。问题是最后记录的事件在检查 readinessProbe
时是一个错误。
我在我的实验室使用以下 pod 清单对其进行了测试:
apiVersion: v1
kind: Pod
metadata:
name: readiness-exec
spec:
containers:
- name: readiness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- sleep 30; touch /tmp/healthy; sleep 600
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
可以看出,30 秒后将在 pod 中创建一个文件 /tmp/healthy
,readinessProbe
将在 5 秒后检查文件是否存在,并在每 5 秒后重复检查一次.
描述这个 pod 会给我:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m56s default-scheduler Successfully assigned default/readiness-exec to yaki-118-2
Normal Pulling 7m55s kubelet, yaki-118-2 Pulling image "k8s.gcr.io/busybox"
Normal Pulled 7m55s kubelet, yaki-118-2 Successfully pulled image "k8s.gcr.io/busybox"
Normal Created 7m55s kubelet, yaki-118-2 Created container readiness
Normal Started 7m55s kubelet, yaki-118-2 Started container readiness
Warning Unhealthy 7m25s (x6 over 7m50s) kubelet, yaki-118-2 Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory
readinessProbe
查找文件 6 次都没有成功,完全正确,因为我将其配置为每 5 秒检查一次,文件在 30 秒后创建。
您认为的问题实际上是预期的行为。您的事件告诉您 readinessProbe
在 21 分钟前检查失败。这实际上意味着您的 pod 从 21 分钟前开始就健康了。