pods 之间请求失败

Failed request between pods

我在 EKS 上有一个集群,上面有一些 APIs 运行,这是用于部署它们的 yaml 文件:

apiVersion: v1
kind: Service
metadata:
  name: <api-name>
spec:
  type: ClusterIP
  selector:
    app: <api-name>
  ports:
    - protocol: TCP
      port: 80
      targetPort: <container-port>
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: <api-name>
spec:
  replicas: 1
  selector:
    matchLabels:
      app: <api-name>
  template:
    metadata:
      labels:
        app: <api-name>
    spec:
      containers:
      - name: <api-name>
        image: <ecr-image-url>
        ports:
        - containerPort: <container-port>
          name: <api-name>
        env:
          - name: ENVIRONMENT
            value: <environment>
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: <api-name>
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: <app-name>.<dns>
    http:
      paths:
      -  backend:
          serviceName: <api-name>
          servicePort: 80

路由工作正常(由 nginx-ingress 创建的网络负载平衡器),但是当我尝试从一个 pod 向另一个 pod 发出请求时,我收到:

[2020-08-14 11:49:42,214] ERROR in app: Exception on /services [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1948, in full_dispatch_request
    rv = self.preprocess_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2242, in preprocess_request
    rv = func()
  File "/app/app/views/__init__.py", line 12, in before_rest_callback
    validate_request(token)
  File "/app/app/utils.py", line 29, in validate_request
    response = httpx.post(url, headers=headers, timeout=60)
  File "/usr/local/lib/python3.8/site-packages/httpx/_api.py", line 269, in post
    return request(
  File "/usr/local/lib/python3.8/site-packages/httpx/_api.py", line 86, in request
    return client.request(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 640, in request
    return self.send(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 670, in send
    response = self._send_handling_redirects(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 699, in _send_handling_redirects
    response = self._send_handling_auth(
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 736, in _send_handling_auth
    response = self._send_single_request(request, timeout)
  File "/usr/local/lib/python3.8/site-packages/httpx/_client.py", line 759, in _send_single_request
    (
  File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/site-packages/httpx/_exceptions.py", line 359, in map_exceptions
    raise mapped_exc(message, **kwargs) from None  # type: ignore
httpx._exceptions.ReadError: Server disconnected while attempting read

我无法在 pods 之间建立联系。该请求未到达同一集群、同一节点上的其他应用程序 运行。 Nginx Ingress 是在 official documentation.

之后安装的

关于可能导致此问题的任何线索?我丢弃了与部署相关的任何内容(在本例中为 API 或 gunicorn)。好像跟集群and/or nginx-ingress有关。尝试搜索并找到与“空闲超时”相关的内容,但这不适用于网络负载均衡器。

使用 http://<app-name> 解决了问题(app-name 是服务名称)。参考: