运行 Pod 需要很长时间才能访问内部服务
Running Pod takes a long time for internal service to be accessible
我已经实现了 gRPC 服务,将其构建到容器中,并使用 k8s(尤其是 AWS EKS)将其部署为 DaemonSet。
Pod 启动并很快进入运行状态,但需要很长时间才能访问实际服务,通常为 300s。
其实我在运行kubectl logs
打印Pod的log的时候,空了好久
我在服务一开始就记录了一些东西。事实上,我的代码看起来像
package main
func init() {
log.Println("init")
}
func main() {
// ...
}
所以我很确定当没有日志时,服务还没有启动。
我了解到 Pod 是 运行ning 和里面的实际进程是 运行ning 之间可能存在时间差。但是,300s 对我来说太长了。
此外,这种情况是随机发生的,有时服务几乎立即就绪。顺便说一句,我的运行时间图像是基于chromedp headless-shell,不确定是否相关。
谁能给点建议,如何调试定位问题?非常感谢!
更新
我没有设置任何就绪探测器。
我的 DaemonSet 的 运行 kubectl get -o yaml
给出
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2021-10-13T06:30:16Z"
generation: 1
labels:
app: worker
uuid: worker
name: worker
namespace: collection-14f45957-e268-4719-88c3-50b533b0ae66
resourceVersion: "47265945"
uid: 88e4671f-9e33-43ef-9c49-b491dcb578e4
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: worker
uuid: worker
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "2112"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: worker
uuid: worker
spec:
containers:
- env:
- name: GRPC_PORT
value: "22345"
- name: DEBUG
value: "false"
- name: TARGET
value: localhost:12345
- name: TRACKER
value: 10.100.255.31:12345
- name: MONITOR
value: 10.100.125.35:12345
- name: COLLECTABLE_METHODS
value: shopping.ShoppingService.GetShop
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: DISTRIBUTABLE_METHODS
value: collection.CollectionService.EnumerateShops
- name: PERFORM_TASK_INTERVAL
value: 0.000000s
image: xxx
imagePullPolicy: Always
name: worker
ports:
- containerPort: 22345
protocol: TCP
resources:
requests:
cpu: 1800m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: CAPTCHA_PARALLEL
value: "32"
- name: HTTP_PROXY
value: http://10.100.215.25:8080
- name: HTTPS_PROXY
value: http://10.100.215.25:8080
- name: API
value: 10.100.111.11:12345
- name: NO_PROXY
value: 10.100.111.11:12345
- name: POD_IP
image: xxx
imagePullPolicy: Always
name: source
ports:
- containerPort: 12345
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ssl/certs/api.crt
name: ca
readOnly: true
subPath: tls.crt
dnsPolicy: ClusterFirst
nodeSelector:
api/nodegroup-app: worker
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: ca
secret:
defaultMode: 420
secretName: ca
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 1
updatedNumberScheduled: 2
此外,Pod中还有两个容器。只有一个启动特别慢,另一个一直没问题。
当您将 HTTP_PROXY 用于您的解决方案时,请注意它的路由可能与您的底层集群网络有何不同 - 这通常会导致意外超时。
我已经发布了社区维基答案来总结主题:
正如 gohm'c 在评论中提到的:
Do connections made by container "source" always have to go thru HTTP_PROXY, even if it is connecting services in the cluster - do you think possible long time been taken because of proxy? Can try kubectl exec -it <pod> -c <source> -- sh
and curl/wget external services.
这是一个很好的观察。请注意,某些连接可以直接建立,通过代理添加额外流量可能会导致延迟。例如,可能会出现瓶颈。您可以在 documentation.
中阅读有关使用 HTTP 代理访问 Kubernetes API 的更多信息
此外,您还可以创建 readiness probes 以了解容器何时准备好开始接受流量。
A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
我已经实现了 gRPC 服务,将其构建到容器中,并使用 k8s(尤其是 AWS EKS)将其部署为 DaemonSet。
Pod 启动并很快进入运行状态,但需要很长时间才能访问实际服务,通常为 300s。
其实我在运行kubectl logs
打印Pod的log的时候,空了好久
我在服务一开始就记录了一些东西。事实上,我的代码看起来像
package main
func init() {
log.Println("init")
}
func main() {
// ...
}
所以我很确定当没有日志时,服务还没有启动。
我了解到 Pod 是 运行ning 和里面的实际进程是 运行ning 之间可能存在时间差。但是,300s 对我来说太长了。
此外,这种情况是随机发生的,有时服务几乎立即就绪。顺便说一句,我的运行时间图像是基于chromedp headless-shell,不确定是否相关。
谁能给点建议,如何调试定位问题?非常感谢!
更新
我没有设置任何就绪探测器。
我的 DaemonSet 的运行 kubectl get -o yaml
给出
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2021-10-13T06:30:16Z"
generation: 1
labels:
app: worker
uuid: worker
name: worker
namespace: collection-14f45957-e268-4719-88c3-50b533b0ae66
resourceVersion: "47265945"
uid: 88e4671f-9e33-43ef-9c49-b491dcb578e4
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: worker
uuid: worker
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "2112"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: worker
uuid: worker
spec:
containers:
- env:
- name: GRPC_PORT
value: "22345"
- name: DEBUG
value: "false"
- name: TARGET
value: localhost:12345
- name: TRACKER
value: 10.100.255.31:12345
- name: MONITOR
value: 10.100.125.35:12345
- name: COLLECTABLE_METHODS
value: shopping.ShoppingService.GetShop
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: DISTRIBUTABLE_METHODS
value: collection.CollectionService.EnumerateShops
- name: PERFORM_TASK_INTERVAL
value: 0.000000s
image: xxx
imagePullPolicy: Always
name: worker
ports:
- containerPort: 22345
protocol: TCP
resources:
requests:
cpu: 1800m
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- env:
- name: CAPTCHA_PARALLEL
value: "32"
- name: HTTP_PROXY
value: http://10.100.215.25:8080
- name: HTTPS_PROXY
value: http://10.100.215.25:8080
- name: API
value: 10.100.111.11:12345
- name: NO_PROXY
value: 10.100.111.11:12345
- name: POD_IP
image: xxx
imagePullPolicy: Always
name: source
ports:
- containerPort: 12345
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ssl/certs/api.crt
name: ca
readOnly: true
subPath: tls.crt
dnsPolicy: ClusterFirst
nodeSelector:
api/nodegroup-app: worker
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: ca
secret:
defaultMode: 420
secretName: ca
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
status:
currentNumberScheduled: 2
desiredNumberScheduled: 2
numberAvailable: 2
numberMisscheduled: 0
numberReady: 2
observedGeneration: 1
updatedNumberScheduled: 2
此外,Pod中还有两个容器。只有一个启动特别慢,另一个一直没问题。
当您将 HTTP_PROXY 用于您的解决方案时,请注意它的路由可能与您的底层集群网络有何不同 - 这通常会导致意外超时。
我已经发布了社区维基答案来总结主题:
正如 gohm'c 在评论中提到的:
Do connections made by container "source" always have to go thru HTTP_PROXY, even if it is connecting services in the cluster - do you think possible long time been taken because of proxy? Can try
kubectl exec -it <pod> -c <source> -- sh
and curl/wget external services.
这是一个很好的观察。请注意,某些连接可以直接建立,通过代理添加额外流量可能会导致延迟。例如,可能会出现瓶颈。您可以在 documentation.
中阅读有关使用 HTTP 代理访问 Kubernetes API 的更多信息此外,您还可以创建 readiness probes 以了解容器何时准备好开始接受流量。
A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The kubelet uses startup probes to know when a container application has started. If such a probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't interfere with the application startup. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.