k8s pod 就绪探测因连接被拒绝而失败,但 pod 可以很好地处理请求

k8s pod readiness probe fails with connection refused, but pod is serving requests just fine

我很难理解为什么 pods 就绪探测失败了。

  Warning  Unhealthy  21m (x2 over 21m)  kubelet, REDACTED  Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused

如果我执行此 pod(或实际上执行该应用程序的任何其他 pod),我可以 运行 对那个非常 URL 卷曲而不会出现问题:

kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
*   Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
< 
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}

我的 Mac 上的 Docker-for-Desktop k8s 集群和 OpenShift 集群都出现了这种行为。

就绪探针在 kubectl describe 中显示如下:

    Readiness:  http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10

helm chart 有这样的配置:

    readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /actuator/health
        port: 8081
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3

我不能完全排除 HTTP 代理设置是罪魁祸首,但 k8s 文档说 HTTP_PROXY 自 v1.13 以来检查被忽略,所以它不应该在本地发生。

OpenShift k8s版本是1.11,我本地是1.16

描述事件始终显示您正在检查的资源上的最后一个事件。问题是最后记录的事件在检查 readinessProbe 时是一个错误。

我在我的实验室使用以下 pod 清单对其进行了测试:

apiVersion: v1
kind: Pod
metadata:
  name: readiness-exec
spec:
  containers:
  - name: readiness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - sleep 30; touch /tmp/healthy; sleep 600
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

可以看出,30 秒后将在 pod 中创建一个文件 /tmp/healthyreadinessProbe 将在 5 秒后检查文件是否存在,并在每 5 秒后重复检查一次.

描述这个 pod 会给我:

Events:
  Type     Reason     Age                    From                 Message
  ----     ------     ----                   ----                 -------
  Normal   Scheduled  7m56s                  default-scheduler    Successfully assigned default/readiness-exec to yaki-118-2
  Normal   Pulling    7m55s                  kubelet, yaki-118-2  Pulling image "k8s.gcr.io/busybox"
  Normal   Pulled     7m55s                  kubelet, yaki-118-2  Successfully pulled image "k8s.gcr.io/busybox"
  Normal   Created    7m55s                  kubelet, yaki-118-2  Created container readiness
  Normal   Started    7m55s                  kubelet, yaki-118-2  Started container readiness
  Warning  Unhealthy  7m25s (x6 over 7m50s)  kubelet, yaki-118-2  Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory

readinessProbe 查找文件 6 次都没有成功,完全正确,因为我将其配置为每 5 秒检查一次,文件在 30 秒后创建。

您认为的问题实际上是预期的行为。您的事件告诉您 readinessProbe 在 21 分钟前检查失败。这实际上意味着您的 pod 从 21 分钟前开始就健康了。