端点未使用 pod 的新 IP 地址进行更新

Endpoints are not being updated with new IP address of a pod

平台:AWS EKS

helm 版本的输出:

Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.2", GitCommit:"a8b13cc5ab6a7dbef0a58f5061bcc7c0c61598e7", GitTreeState:"clean"}

kubectl 版本的输出:

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.10-eks-2e569f", GitCommit:"2e569fd887357952e506846ed47fc30cc385409a", GitTreeState:"clean", BuildDate:"2019-07-25T23:13:33Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.): AWS EKS

问题: 在 jenkins pod 重启后,pod 获得了一个新的 IP 地址,ReadinesProbe 应该更新端点但它没有。

kubectl get endpoints
jenkins                                  <none>
jenkins-agent                            <none>

错误:

Readiness probe failed: Get http://192.168.0.109:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

我可以从所有 pods 和工作节点成功访问以上 URL,我得到正确的 Headers。

这是在 helm 升级 jenkins 失败之后发生的,然后我回滚了版本,它成功了(除了现在不更新端点) 现在我需要手动编辑端点以将端点指向 pod 的正确 IP 地址。

部署中的当前 ReadinesProbe 是:

    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /login
        port: http
        scheme: HTTP
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

来自 Jenkins pod 的日志是:

Events:
  Type     Reason                  Age    From                                                  Message
  ----     ------                  ----   ----                                                  -------
  Normal   Scheduled               8m13s  default-scheduler                                     Successfully assigned default/jenkins-pod-id to <ip>.<region>.compute.internal
  Normal   SuccessfulAttachVolume  8m6s   attachdetach-controller                               AttachVolume.Attach succeeded for volume "jenkins"
  Normal   Pulling                 8m4s   kubelet, <ip>.<region>.compute.internal  pulling image "jenkins/jenkins:2.176.2-alpine"
  Normal   Pulled                  7m57s  kubelet, <ip>.<region>.compute.internal  Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
  Normal   Created                 7m56s  kubelet, <ip>.<region>.compute.internal  Created container
  Normal   Started                 7m56s  kubelet, <ip>.<region>.compute.internal  Started container
  Normal   Pulling                 7m43s  kubelet, <ip>.<region>.compute.internal  pulling image "jenkins/jenkins:2.176.2-alpine"
  Normal   Pulled                  7m42s  kubelet, <ip>.<region>.compute.internal  Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
  Normal   Created                 7m42s  kubelet, <ip>.<region>.compute.internal  Created container
  Normal   Started                 7m42s  kubelet, <ip>.<region>.compute.internal  Started container
  Warning  Unhealthy               6m40s  kubelet, <ip>.<region>.compute.internal  Readiness probe failed: Get http://<IP>:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

pod 几乎立即获得了 IP,但容器需要几分钟才能启动。如何让 ReadinesProbe 更新 Endpoints 甚至获取 ReadinesProbe 日志?这是 AWS 中的 运行,因此无法访问控制器以获取更多日志。

如果我更新端点的速度足够快,ReadinesProbe 就不会失败,但这在 pod 下次重启时无济于事。

更新: 刚刚启用 EKS 日志并得到这个:

deployment_controller.go:484] Error syncing deployment default/jenkins: Operation cannot be fulfilled on deployments.apps "jenkins": the object has been modified; please apply your changes to the latest version and try again

以下帮助。 Readiness 探测仍然失败,但这是由于 Jenkins 花了 90 秒才开始。我会更新这个。

helm delete jenkins
release "jenkins" deleted


helm rollback jenkins 25
Rollback was a success! Happy Helming!