端点未使用 pod 的新 IP 地址进行更新
Endpoints are not being updated with new IP address of a pod
平台:AWS EKS
helm 版本的输出:
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.2", GitCommit:"a8b13cc5ab6a7dbef0a58f5061bcc7c0c61598e7", GitTreeState:"clean"}
kubectl 版本的输出:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.10-eks-2e569f", GitCommit:"2e569fd887357952e506846ed47fc30cc385409a", GitTreeState:"clean", BuildDate:"2019-07-25T23:13:33Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.): AWS EKS
问题:
在 jenkins pod 重启后,pod 获得了一个新的 IP 地址,ReadinesProbe 应该更新端点但它没有。
kubectl get endpoints
jenkins <none>
jenkins-agent <none>
错误:
Readiness probe failed: Get http://192.168.0.109:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
我可以从所有 pods 和工作节点成功访问以上 URL,我得到正确的 Headers。
这是在 helm 升级 jenkins 失败之后发生的,然后我回滚了版本,它成功了(除了现在不更新端点)
现在我需要手动编辑端点以将端点指向 pod 的正确 IP 地址。
部署中的当前 ReadinesProbe 是:
readinessProbe:
failureThreshold: 3
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
来自 Jenkins pod 的日志是:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m13s default-scheduler Successfully assigned default/jenkins-pod-id to <ip>.<region>.compute.internal
Normal SuccessfulAttachVolume 8m6s attachdetach-controller AttachVolume.Attach succeeded for volume "jenkins"
Normal Pulling 8m4s kubelet, <ip>.<region>.compute.internal pulling image "jenkins/jenkins:2.176.2-alpine"
Normal Pulled 7m57s kubelet, <ip>.<region>.compute.internal Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
Normal Created 7m56s kubelet, <ip>.<region>.compute.internal Created container
Normal Started 7m56s kubelet, <ip>.<region>.compute.internal Started container
Normal Pulling 7m43s kubelet, <ip>.<region>.compute.internal pulling image "jenkins/jenkins:2.176.2-alpine"
Normal Pulled 7m42s kubelet, <ip>.<region>.compute.internal Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
Normal Created 7m42s kubelet, <ip>.<region>.compute.internal Created container
Normal Started 7m42s kubelet, <ip>.<region>.compute.internal Started container
Warning Unhealthy 6m40s kubelet, <ip>.<region>.compute.internal Readiness probe failed: Get http://<IP>:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
pod 几乎立即获得了 IP,但容器需要几分钟才能启动。如何让 ReadinesProbe 更新 Endpoints 甚至获取 ReadinesProbe 日志?这是 AWS 中的 运行,因此无法访问控制器以获取更多日志。
如果我更新端点的速度足够快,ReadinesProbe 就不会失败,但这在 pod 下次重启时无济于事。
更新:
刚刚启用 EKS 日志并得到这个:
deployment_controller.go:484] Error syncing deployment default/jenkins: Operation cannot be fulfilled on deployments.apps "jenkins": the object has been modified; please apply your changes to the latest version and try again
以下帮助。 Readiness 探测仍然失败,但这是由于 Jenkins 花了 90 秒才开始。我会更新这个。
helm delete jenkins
release "jenkins" deleted
helm rollback jenkins 25
Rollback was a success! Happy Helming!
平台:AWS EKS
helm 版本的输出:
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.2", GitCommit:"a8b13cc5ab6a7dbef0a58f5061bcc7c0c61598e7", GitTreeState:"clean"}
kubectl 版本的输出:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T09:18:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.10-eks-2e569f", GitCommit:"2e569fd887357952e506846ed47fc30cc385409a", GitTreeState:"clean", BuildDate:"2019-07-25T23:13:33Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.): AWS EKS
问题: 在 jenkins pod 重启后,pod 获得了一个新的 IP 地址,ReadinesProbe 应该更新端点但它没有。
kubectl get endpoints
jenkins <none>
jenkins-agent <none>
错误:
Readiness probe failed: Get http://192.168.0.109:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
我可以从所有 pods 和工作节点成功访问以上 URL,我得到正确的 Headers。
这是在 helm 升级 jenkins 失败之后发生的,然后我回滚了版本,它成功了(除了现在不更新端点) 现在我需要手动编辑端点以将端点指向 pod 的正确 IP 地址。
部署中的当前 ReadinesProbe 是:
readinessProbe:
failureThreshold: 3
httpGet:
path: /login
port: http
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
来自 Jenkins pod 的日志是:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m13s default-scheduler Successfully assigned default/jenkins-pod-id to <ip>.<region>.compute.internal
Normal SuccessfulAttachVolume 8m6s attachdetach-controller AttachVolume.Attach succeeded for volume "jenkins"
Normal Pulling 8m4s kubelet, <ip>.<region>.compute.internal pulling image "jenkins/jenkins:2.176.2-alpine"
Normal Pulled 7m57s kubelet, <ip>.<region>.compute.internal Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
Normal Created 7m56s kubelet, <ip>.<region>.compute.internal Created container
Normal Started 7m56s kubelet, <ip>.<region>.compute.internal Started container
Normal Pulling 7m43s kubelet, <ip>.<region>.compute.internal pulling image "jenkins/jenkins:2.176.2-alpine"
Normal Pulled 7m42s kubelet, <ip>.<region>.compute.internal Successfully pulled image "jenkins/jenkins:2.176.2-alpine"
Normal Created 7m42s kubelet, <ip>.<region>.compute.internal Created container
Normal Started 7m42s kubelet, <ip>.<region>.compute.internal Started container
Warning Unhealthy 6m40s kubelet, <ip>.<region>.compute.internal Readiness probe failed: Get http://<IP>:8080/login: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
pod 几乎立即获得了 IP,但容器需要几分钟才能启动。如何让 ReadinesProbe 更新 Endpoints 甚至获取 ReadinesProbe 日志?这是 AWS 中的 运行,因此无法访问控制器以获取更多日志。
如果我更新端点的速度足够快,ReadinesProbe 就不会失败,但这在 pod 下次重启时无济于事。
更新: 刚刚启用 EKS 日志并得到这个:
deployment_controller.go:484] Error syncing deployment default/jenkins: Operation cannot be fulfilled on deployments.apps "jenkins": the object has been modified; please apply your changes to the latest version and try again
以下帮助。 Readiness 探测仍然失败,但这是由于 Jenkins 花了 90 秒才开始。我会更新这个。
helm delete jenkins
release "jenkins" deleted
helm rollback jenkins 25
Rollback was a success! Happy Helming!