k8s pod 就绪探测因连接被拒绝而失败，但 pod 可以很好地处理请求

Question

我很难理解为什么 pods 就绪探测失败了。

  Warning  Unhealthy  21m (x2 over 21m)  kubelet, REDACTED  Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused

如果我执行此 pod（或实际上执行该应用程序的任何其他 pod），我可以运行对那个非常 URL 卷曲而不会出现问题：

kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
*   Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
> 
< HTTP/1.1 200 
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
< 
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}

我的 Mac 上的 Docker-for-Desktop k8s 集群和 OpenShift 集群都出现了这种行为。

就绪探针在 kubectl describe 中显示如下：

    Readiness:  http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10

helm chart 有这样的配置：

    readinessProbe:
      failureThreshold: 10
      httpGet:
        path: /actuator/health
        port: 8081
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3

我不能完全排除 HTTP 代理设置是罪魁祸首，但 k8s 文档说 HTTP_PROXY 自 v1.13 以来检查被忽略，所以它不应该在本地发生。

OpenShift k8s版本是1.11，我本地是1.16

Answer 1

描述事件始终显示您正在检查的资源上的最后一个事件。问题是最后记录的事件在检查 readinessProbe 时是一个错误。

我在我的实验室使用以下 pod 清单对其进行了测试：

apiVersion: v1
kind: Pod
metadata:
  name: readiness-exec
spec:
  containers:
  - name: readiness
    image: k8s.gcr.io/busybox
    args:
    - /bin/sh
    - -c
    - sleep 30; touch /tmp/healthy; sleep 600
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

可以看出，30 秒后将在 pod 中创建一个文件 /tmp/healthy，readinessProbe 将在 5 秒后检查文件是否存在，并在每 5 秒后重复检查一次.

描述这个 pod 会给我：

Events:
  Type     Reason     Age                    From                 Message
  ----     ------     ----                   ----                 -------
  Normal   Scheduled  7m56s                  default-scheduler    Successfully assigned default/readiness-exec to yaki-118-2
  Normal   Pulling    7m55s                  kubelet, yaki-118-2  Pulling image "k8s.gcr.io/busybox"
  Normal   Pulled     7m55s                  kubelet, yaki-118-2  Successfully pulled image "k8s.gcr.io/busybox"
  Normal   Created    7m55s                  kubelet, yaki-118-2  Created container readiness
  Normal   Started    7m55s                  kubelet, yaki-118-2  Started container readiness
  Warning  Unhealthy  7m25s (x6 over 7m50s)  kubelet, yaki-118-2  Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory

readinessProbe 查找文件 6 次都没有成功，完全正确，因为我将其配置为每 5 秒检查一次，文件在 30 秒后创建。

您认为的问题实际上是预期的行为。您的事件告诉您 readinessProbe 在 21 分钟前检查失败。这实际上意味着您的 pod 从 21 分钟前开始就健康了。

k8s pod 就绪探测因连接被拒绝而失败，但 pod 可以很好地处理请求

k8s pod readiness probe fails with connection refused, but pod is serving requests just fine

openshift

kubernetes