带有 NEG 的 GKE Ingress:后端健康检查未通过
GKE Ingress with NEGs: backend healthcheck doesn't pass
我创建的 GKE Ingress 如下:
apiVersion: cloud.google.com/v1beta1 #tried cloud.google.com/v1 as well
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
connectionDraining:
drainingTimeoutSec: 60
---
apiVersion: v1
kind: Service
metadata:
name: web-engine-service
namespace: prod
annotations:
cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created.
cloud.google.com/backend-config: '{"ports": {"web-engine-port":"backend-config"}}' #https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#associating_backendconfig_with_your_ingress
spec:
selector:
app: web-engine-pod
ports:
- name: web-engine-port
protocol: TCP
port: 8080
targetPort: 5000
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
labels:
app: web-engine-deployment
environment: prod
name: web-engine-deployment
namespace: prod
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: web-engine-pod
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
name: web-engine-pod
labels:
app: web-engine-pod
environment: prod
spec:
containers:
- image: my-image:my-tag
imagePullPolicy: Always
name: web-engine-1
resources: {}
ports:
- name: flask-port
containerPort: 5000
protocol: TCP
readinessProbe:
httpGet:
path: /healthcheck
port: 5000
initialDelaySeconds: 30
periodSeconds: 100
restartPolicy: Always
terminationGracePeriodSeconds: 30
---
apiVersion: networking.gke.io/v1beta2
kind: ManagedCertificate
metadata:
name: my-certificate
namespace: prod
spec:
domains:
- api.mydomain.com #https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#renewal
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: prod-ingress
namespace: prod
annotations:
kubernetes.io/ingress.allow-http: "false"
kubernetes.io/ingress.global-static-ip-name: load-balancer-ip
networking.gke.io/managed-certificates: my-certificate
spec:
rules:
- http:
paths:
- path: /model
backend:
serviceName: web-engine-service
servicePort: 8080
我不知道自己做错了什么,因为我的健康检查不正常。根据我添加到应用程序的外围日志记录,甚至没有任何东西试图攻击那个 pod。
我已经尝试 BackendConfig
8080
和 5000
。
顺便说一下,如果负载均衡器应该配置为相应 Pods 或服务的 targetPorts
,根据文档还不是 100% 清楚。
健康检查已注册到 HTTP 负载平衡器和计算引擎:
后台服务IP好像有问题
对应的后端服务配置:
$ gcloud compute backend-services describe k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
...
affinityCookieTtlSec: 0
backends:
- balancingMode: RATE
capacityScaler: 1.0
group: https://www.googleapis.com/compute/v1/projects/wnd/zones/europe-west3-a/networkEndpointGroups/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
maxRatePerEndpoint: 1.0
connectionDraining:
drainingTimeoutSec: 60
creationTimestamp: '2020-08-01T11:14:06.096-07:00'
description: '{"kubernetes.io/service-name":"prod/web-engine-service","kubernetes.io/service-port":"8080","x-features":["NEG"]}'
enableCDN: false
fingerprint: 5Vkqvg9lcRg=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/wnd/global/healthChecks/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
id: '2233674285070159361'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
enable: true
sampleRate: 1.0
name: k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
port: 80
portName: port0
protocol: HTTP
selfLink: https://www.googleapis.com/compute/v1/projects/wnd/global/backendServices/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
sessionAffinity: NONE
timeoutSec: 30
(端口 80 看起来很可疑,但我认为它可能只是作为默认端口保留在那里,并且在配置 NEG 时未被使用)。
想通了。默认情况下,即使是最新的 GKE 集群也是在不支持 IP 别名的情况下创建的。它也被称为VPC-native。一开始我什至懒得去检查,因为:
- NEG 开箱即用,而且它们似乎是默认的,在我拥有的 GKE 版本 (
1.17.8-gke.17
) 上使用时不需要显式注释。默认情况下不启用 IP 别名是没有意义的,因为这基本上意味着集群默认处于 non-functional 状态。
- 我最初没有检查
VPC-Native
支持,因为该功能的这个名称只是误导。我之前有丰富的 AWS 经验,我的错误假设是 VPC-Native
就像 EC2-VPC,而不是遗留的 EC2-Classic。
我创建的 GKE Ingress 如下:
apiVersion: cloud.google.com/v1beta1 #tried cloud.google.com/v1 as well
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
connectionDraining:
drainingTimeoutSec: 60
---
apiVersion: v1
kind: Service
metadata:
name: web-engine-service
namespace: prod
annotations:
cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created.
cloud.google.com/backend-config: '{"ports": {"web-engine-port":"backend-config"}}' #https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#associating_backendconfig_with_your_ingress
spec:
selector:
app: web-engine-pod
ports:
- name: web-engine-port
protocol: TCP
port: 8080
targetPort: 5000
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
labels:
app: web-engine-deployment
environment: prod
name: web-engine-deployment
namespace: prod
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: web-engine-pod
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
name: web-engine-pod
labels:
app: web-engine-pod
environment: prod
spec:
containers:
- image: my-image:my-tag
imagePullPolicy: Always
name: web-engine-1
resources: {}
ports:
- name: flask-port
containerPort: 5000
protocol: TCP
readinessProbe:
httpGet:
path: /healthcheck
port: 5000
initialDelaySeconds: 30
periodSeconds: 100
restartPolicy: Always
terminationGracePeriodSeconds: 30
---
apiVersion: networking.gke.io/v1beta2
kind: ManagedCertificate
metadata:
name: my-certificate
namespace: prod
spec:
domains:
- api.mydomain.com #https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#renewal
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: prod-ingress
namespace: prod
annotations:
kubernetes.io/ingress.allow-http: "false"
kubernetes.io/ingress.global-static-ip-name: load-balancer-ip
networking.gke.io/managed-certificates: my-certificate
spec:
rules:
- http:
paths:
- path: /model
backend:
serviceName: web-engine-service
servicePort: 8080
我不知道自己做错了什么,因为我的健康检查不正常。根据我添加到应用程序的外围日志记录,甚至没有任何东西试图攻击那个 pod。
我已经尝试 BackendConfig
8080
和 5000
。
顺便说一下,如果负载均衡器应该配置为相应 Pods 或服务的 targetPorts
,根据文档还不是 100% 清楚。
健康检查已注册到 HTTP 负载平衡器和计算引擎:
后台服务IP好像有问题
对应的后端服务配置:
$ gcloud compute backend-services describe k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
...
affinityCookieTtlSec: 0
backends:
- balancingMode: RATE
capacityScaler: 1.0
group: https://www.googleapis.com/compute/v1/projects/wnd/zones/europe-west3-a/networkEndpointGroups/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
maxRatePerEndpoint: 1.0
connectionDraining:
drainingTimeoutSec: 60
creationTimestamp: '2020-08-01T11:14:06.096-07:00'
description: '{"kubernetes.io/service-name":"prod/web-engine-service","kubernetes.io/service-port":"8080","x-features":["NEG"]}'
enableCDN: false
fingerprint: 5Vkqvg9lcRg=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/wnd/global/healthChecks/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
id: '2233674285070159361'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
enable: true
sampleRate: 1.0
name: k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
port: 80
portName: port0
protocol: HTTP
selfLink: https://www.googleapis.com/compute/v1/projects/wnd/global/backendServices/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
sessionAffinity: NONE
timeoutSec: 30
(端口 80 看起来很可疑,但我认为它可能只是作为默认端口保留在那里,并且在配置 NEG 时未被使用)。
想通了。默认情况下,即使是最新的 GKE 集群也是在不支持 IP 别名的情况下创建的。它也被称为VPC-native。一开始我什至懒得去检查,因为:
- NEG 开箱即用,而且它们似乎是默认的,在我拥有的 GKE 版本 (
1.17.8-gke.17
) 上使用时不需要显式注释。默认情况下不启用 IP 别名是没有意义的,因为这基本上意味着集群默认处于 non-functional 状态。 - 我最初没有检查
VPC-Native
支持,因为该功能的这个名称只是误导。我之前有丰富的 AWS 经验,我的错误假设是VPC-Native
就像 EC2-VPC,而不是遗留的 EC2-Classic。