Kubernetes:AKS Ingress 仅与同一节点和子网上的 Pods 通信
Kubernetes: AKS Ingress only communicates with Pods on same node and subnet
我部署了一个 3 节点 AKS kubernetes 集群(kubenet 是网络覆盖),NGINX Ingress 配置为执行基于名称的路由到 pods。
我在集群上以不同的名称部署了许多相同的应用程序。
我可以通过 http 访问某些应用程序,但不能访问其他应用程序。仔细检查后,我发现我可以访问的应用程序都在与入口控制器相同的节点和相同的内部 172.* 子网中。
所有应用程序都在与入口控制器相同的命名空间中。
无法访问的应用程序都在其他2个节点和不同的子网上。所以看起来这是一个网络配置问题。
但是,我找不到什么相关配置可以让入口到达所有应用程序,而不管它们在哪个节点和内部子网上;重新。我相信,这应该是 Kubernetes 的默认行为。
我将如何配置这种所需的行为?
部分测试结果:
kubectl logs https-ingress-controller-6bc79d6c69-7ljkb --namespace ingress-nginx --follow
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.23.0
Build: git-be1329b22
Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------
W0611 14:37:06.679648 6 flags.go:213] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
nginx version: nginx/1.15.9
W0611 14:37:06.685012 6 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0611 14:37:06.685884 6 main.go:200] Creating API client for https://172.17.0.1:443
I0611 14:37:06.712278 6 main.go:244] Running in Kubernetes cluster version v1.14 (v1.14.0) - git (clean) commit 641856db18352033a0d96dbc99153fa3b27298e5 - platform linux/amd64
I0611 14:37:07.055688 6 nginx.go:261] Starting NGINX Ingress controller
I0611 14:37:07.066491 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"56d2e0c2-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5775", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/tcp-services
I0611 14:37:07.067855 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"nginx-configuration", UID:"56cdccf4-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5774", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/nginx-configuration
I0611 14:37:07.075165 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"udp-services", UID:"56d6c9e3-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5776", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/udp-services
I0611 14:37:08.159406 6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"ingress-nginx", Name:"https-ingress", UID:"103260ed-8c4a-11e9-8911-8272a7251f4e", APIVersion:"extensions/v1beta1", ResourceVersion:"17054", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress ingress-nginx/https-ingress
I0611 14:37:08.160481 6 backend_ssl.go:68] Adding Secret "ingress-nginx/chachingtls" to the local store
I0611 14:37:08.256541 6 nginx.go:282] Starting NGINX process
I0611 14:37:08.256572 6 leaderelection.go:205] attempting to acquire leader lease ingress-nginx/ingress-controller-leader-nginx...
I0611 14:37:08.257345 6 controller.go:172] Configuration changes detected, backend reload required.
I0611 14:37:08.261914 6 status.go:148] new leader elected: nginx-ingress-controller-6674b5b5dc-nhjcc
I0611 14:37:08.328794 6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"ingress-nginx", Name:"https-ingress", UID:"103260ed-8c4a-11e9-8911-8272a7251f4e", APIVersion:"extensions/v1beta1", ResourceVersion:"17059", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress ingress-nginx/https-ingress
I0611 14:37:08.391940 6 controller.go:190] Backend successfully reloaded.
I0611 14:37:08.392044 6 controller.go:200] Initial sync, sleeping for 1 second.
[11/Jun/2019:14:37:09 +0000]TCP200000.000
- 同一命名空间中的应用程序列表 pods:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
durian 1/1 Running 0 12m 172.18.0.14 aks-agentpool-82039614-0 <none> <none>
https-ingress-controller-6bc79d6c69-mg7lm 1/1 Running 0 15m 172.18.2.11 aks-agentpool-82039614-2 <none> <none>
kiwi 1/1 Running 0 12m 172.18.2.14 aks-agentpool-82039614-2 <none> <none>
mango 1/1 Running 0 13m 172.18.2.12 aks-agentpool-82039614-2 <none> <none>
mangosteen 1/1 Running 0 12m 172.18.2.13 aks-agentpool-82039614-2 <none> <none>
orange 1/1 Running 0 12m 172.18.2.15 aks-agentpool-82039614-2 <none> <none>
- 不同的内部网络和节点:超时:
kubectl exec -ti https-ingress-controller-6bc79d6c69-mg7lm /bin/bash -n ingress-nginx
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.1.10:5678
^C
- 相同的内部网络和节点 - 确定:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.2.9:5679
mango
- 相同的内部网络和节点 - 确定:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.2.5:8080
<!-- HTML for static distribution bundle build -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Swagger UI</title>
<link rel="stylesheet" type="text/css" href="./swagger-ui.css" >
<link rel="icon" type="image/png" href="./favicon-32x32.png" sizes="32x32" />
<link rel="icon" type="image/png" href="./favicon-16x16.png" sizes="16x16" />
<style>
html
- 不同的内部 network/node - 超时:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.1.9:5678
^C
我已经使用完全相同的配置多次销毁并重新部署集群和应用程序,并且行为相同。
对于AKS中的kubelet网络,pods可以相互通信。你看下面的描述:
With kubenet, nodes get an IP address from the Azure virtual network
subnet. Pods receive an IP address from a logically different address
space to the Azure virtual network subnet of the nodes. Network
address translation (NAT) is then configured so that the pods can
reach resources on the Azure virtual network. The source IP address of
the traffic is NAT'd to the node's primary IP address.
pods可以通过带NAT的节点与他人通信。并且只有节点可以接收可路由的 IP 地址。您可以在门户中看到这样的路线:
Azure 将为您完成所有事情。它在我身边运作良好。所以如果它不适合你。然后你可以检查路由是否正确。
这是在不同地址 space:
中测试 pods 通信的屏幕截图
似乎在 kubenet 网络模型的情况下,当使用 pre-existing VNET 和子网(不专用于 AKS)时,带有 AKS 节点 UDR 的路由 table 未附加到节点默认部署到的子网,这意味着 pods 无法跨节点相互访问。
Microsoft Azure 文档中提到了需要为 kubenet 配置 UDR 的事实,但是没有提供有关为 AKS 路由 tables 和 UDR 的实际设置的说明。
必须在将路由 table 附加到 AKS 子网后创建这些路由,或者将路由添加到子网的现有路由 table(如果存在)。
这里记录了解决方案,它主要涉及将 AKS 安装生成的默认路由 table 附加到 AKS 子网:
https://github.com/Azure/aks-engine/blob/master/docs/tutorials/custom-vnet.md
即自定义和运行这个脚本:
#!/bin/bash
rt=$(az network route-table list -g RESOURCE_GROUP_NAME_KUBE -o json | jq -r '.[].id')
az network vnet subnet update \
-g RESOURCE_GROUP_NAME_VNET \
--route-table $rt \
--ids "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_NAME_VNET/providers/Microsoft.Network/VirtualNetworks/KUBERNETES_CUSTOM_VNET/subnets/KUBERNETES_SUBNET"
我现在可以通过 Ingress 访问集群的所有节点 pods。
注意:或者,可以手动将 UDR 添加到任何 pre-existing 路由 table,您可以在部署 AKS 之前附加到 pre-created AKS 子网。
我部署了一个 3 节点 AKS kubernetes 集群(kubenet 是网络覆盖),NGINX Ingress 配置为执行基于名称的路由到 pods。
我在集群上以不同的名称部署了许多相同的应用程序。
我可以通过 http 访问某些应用程序,但不能访问其他应用程序。仔细检查后,我发现我可以访问的应用程序都在与入口控制器相同的节点和相同的内部 172.* 子网中。
所有应用程序都在与入口控制器相同的命名空间中。
无法访问的应用程序都在其他2个节点和不同的子网上。所以看起来这是一个网络配置问题。
但是,我找不到什么相关配置可以让入口到达所有应用程序,而不管它们在哪个节点和内部子网上;重新。我相信,这应该是 Kubernetes 的默认行为。
我将如何配置这种所需的行为?
部分测试结果:
kubectl logs https-ingress-controller-6bc79d6c69-7ljkb --namespace ingress-nginx --follow
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: 0.23.0
Build: git-be1329b22
Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------
W0611 14:37:06.679648 6 flags.go:213] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
nginx version: nginx/1.15.9
W0611 14:37:06.685012 6 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0611 14:37:06.685884 6 main.go:200] Creating API client for https://172.17.0.1:443
I0611 14:37:06.712278 6 main.go:244] Running in Kubernetes cluster version v1.14 (v1.14.0) - git (clean) commit 641856db18352033a0d96dbc99153fa3b27298e5 - platform linux/amd64
I0611 14:37:07.055688 6 nginx.go:261] Starting NGINX Ingress controller
I0611 14:37:07.066491 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"tcp-services", UID:"56d2e0c2-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5775", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/tcp-services
I0611 14:37:07.067855 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"nginx-configuration", UID:"56cdccf4-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5774", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/nginx-configuration
I0611 14:37:07.075165 6 event.go:221] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"udp-services", UID:"56d6c9e3-8c47-11e9-8911-8272a7251f4e", APIVersion:"v1", ResourceVersion:"5776", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/udp-services
I0611 14:37:08.159406 6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"ingress-nginx", Name:"https-ingress", UID:"103260ed-8c4a-11e9-8911-8272a7251f4e", APIVersion:"extensions/v1beta1", ResourceVersion:"17054", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress ingress-nginx/https-ingress
I0611 14:37:08.160481 6 backend_ssl.go:68] Adding Secret "ingress-nginx/chachingtls" to the local store
I0611 14:37:08.256541 6 nginx.go:282] Starting NGINX process
I0611 14:37:08.256572 6 leaderelection.go:205] attempting to acquire leader lease ingress-nginx/ingress-controller-leader-nginx...
I0611 14:37:08.257345 6 controller.go:172] Configuration changes detected, backend reload required.
I0611 14:37:08.261914 6 status.go:148] new leader elected: nginx-ingress-controller-6674b5b5dc-nhjcc
I0611 14:37:08.328794 6 event.go:221] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"ingress-nginx", Name:"https-ingress", UID:"103260ed-8c4a-11e9-8911-8272a7251f4e", APIVersion:"extensions/v1beta1", ResourceVersion:"17059", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress ingress-nginx/https-ingress
I0611 14:37:08.391940 6 controller.go:190] Backend successfully reloaded.
I0611 14:37:08.392044 6 controller.go:200] Initial sync, sleeping for 1 second.
[11/Jun/2019:14:37:09 +0000]TCP200000.000
- 同一命名空间中的应用程序列表 pods:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
durian 1/1 Running 0 12m 172.18.0.14 aks-agentpool-82039614-0 <none> <none>
https-ingress-controller-6bc79d6c69-mg7lm 1/1 Running 0 15m 172.18.2.11 aks-agentpool-82039614-2 <none> <none>
kiwi 1/1 Running 0 12m 172.18.2.14 aks-agentpool-82039614-2 <none> <none>
mango 1/1 Running 0 13m 172.18.2.12 aks-agentpool-82039614-2 <none> <none>
mangosteen 1/1 Running 0 12m 172.18.2.13 aks-agentpool-82039614-2 <none> <none>
orange 1/1 Running 0 12m 172.18.2.15 aks-agentpool-82039614-2 <none> <none>
- 不同的内部网络和节点:超时:
kubectl exec -ti https-ingress-controller-6bc79d6c69-mg7lm /bin/bash -n ingress-nginx
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.1.10:5678
^C
- 相同的内部网络和节点 - 确定:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.2.9:5679
mango
- 相同的内部网络和节点 - 确定:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.2.5:8080
<!-- HTML for static distribution bundle build -->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Swagger UI</title>
<link rel="stylesheet" type="text/css" href="./swagger-ui.css" >
<link rel="icon" type="image/png" href="./favicon-32x32.png" sizes="32x32" />
<link rel="icon" type="image/png" href="./favicon-16x16.png" sizes="16x16" />
<style>
html
- 不同的内部 network/node - 超时:
www-data@https-ingress-controller-6bc79d6c69-7ljkb:/etc/nginx$ curl http://172.18.1.9:5678
^C
我已经使用完全相同的配置多次销毁并重新部署集群和应用程序,并且行为相同。
对于AKS中的kubelet网络,pods可以相互通信。你看下面的描述:
With kubenet, nodes get an IP address from the Azure virtual network subnet. Pods receive an IP address from a logically different address space to the Azure virtual network subnet of the nodes. Network address translation (NAT) is then configured so that the pods can reach resources on the Azure virtual network. The source IP address of the traffic is NAT'd to the node's primary IP address.
pods可以通过带NAT的节点与他人通信。并且只有节点可以接收可路由的 IP 地址。您可以在门户中看到这样的路线:
Azure 将为您完成所有事情。它在我身边运作良好。所以如果它不适合你。然后你可以检查路由是否正确。
这是在不同地址 space:
中测试 pods 通信的屏幕截图似乎在 kubenet 网络模型的情况下,当使用 pre-existing VNET 和子网(不专用于 AKS)时,带有 AKS 节点 UDR 的路由 table 未附加到节点默认部署到的子网,这意味着 pods 无法跨节点相互访问。
Microsoft Azure 文档中提到了需要为 kubenet 配置 UDR 的事实,但是没有提供有关为 AKS 路由 tables 和 UDR 的实际设置的说明。
必须在将路由 table 附加到 AKS 子网后创建这些路由,或者将路由添加到子网的现有路由 table(如果存在)。
这里记录了解决方案,它主要涉及将 AKS 安装生成的默认路由 table 附加到 AKS 子网:
https://github.com/Azure/aks-engine/blob/master/docs/tutorials/custom-vnet.md
即自定义和运行这个脚本:
#!/bin/bash
rt=$(az network route-table list -g RESOURCE_GROUP_NAME_KUBE -o json | jq -r '.[].id')
az network vnet subnet update \
-g RESOURCE_GROUP_NAME_VNET \
--route-table $rt \
--ids "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_NAME_VNET/providers/Microsoft.Network/VirtualNetworks/KUBERNETES_CUSTOM_VNET/subnets/KUBERNETES_SUBNET"
我现在可以通过 Ingress 访问集群的所有节点 pods。
注意:或者,可以手动将 UDR 添加到任何 pre-existing 路由 table,您可以在部署 AKS 之前附加到 pre-created AKS 子网。