AWS 上如何使用合适容量的 EKS 解决子网 IP 不足的问题?

How to use EKS with suitable volumes and resolve subnet IP insufficient issue on AWS?

我在 EKS 中部署了一个应用程序。部署总是挂起,当我检查事件时发现这些问题。

$ kubectl get events
LAST SEEN   TYPE      REASON              OBJECT                         MESSAGE
89s         Warning   FailedScheduling    pod/awx-demo-111111111-122222   running PreBind plugin "VolumeBinding": binding volumes: provisioning failed for PVC "awx-demo-projects-claim"
49m         Warning   FailedDeployModel   ingress/awx-demo-ingress        Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
            status code: 400, request id: 11111111-2222-3333-4444-555555555555
32m         Warning   FailedDeployModel   ingress/awx-demo-ingress        Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-01322i912fas0123na. ELB requires at least 8 free IP addresses in each subnet.
            status code: 400, request id: 11111111-2222-3333-4444-555555555515
15m         Warning   FailedDeployModel   ingress/awx-demo-ingress        Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
            status code: 400, request id: 11111111-2222-3333-4444-555555555525
89s         Normal    WaitForPodScheduled   persistentvolumeclaim/awx-demo-projects-claim   waiting for pod awx-demo-111111111-122222 to be scheduled
21m         Warning   ProvisioningFailed    persistentvolumeclaim/awx-demo-projects-claim   Failed to provision volume with StorageClass "gp2": invalid AccessModes [ReadWriteMany]: only AccessModes [ReadWriteOnce] are supported

看来是设备问题和子网问题。我使用这些配置创建了 EKS 集群和节点组:

resource "aws_eks_cluster" "this" {
  encryption_config {
    resources = ["secrets"]
    provider {
      key_arn = aws_kms_key.this.arn
    }
  }

  enabled_cluster_log_types = ["api", "authenticator", "audit", "scheduler", "controllerManager"]
  name                      = local.cluster_name
  version                   = "1.20"
  role_arn                  = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids = [
      data.aws_ssm_parameter.private_subnet_0_id.value,
      data.aws_ssm_parameter.private_subnet_1_id.value,
    ]

    security_group_ids     = [aws_security_group.this.id]
    endpoint_public_access = true
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_resource_controller,
    aws_iam_role_policy_attachment.eks_service_policy,
  ]

  tags = merge(
    local.tags,
  )
}

resource "aws_eks_node_group" "this" {
  cluster_name    = local.cluster_name
  node_group_name = local.node_group_name
  node_role_arn   = aws_iam_role.eks_nodes.arn
  instance_types  = ["m5.2xlarge"]

  subnet_ids = [
    data.aws_ssm_parameter.private_subnet_0_id.value,
    data.aws_ssm_parameter.private_subnet_1_id.value,
  ]

  scaling_config {
    desired_size = 2
    max_size     = 2
    min_size     = 2
  }

  lifecycle {
    ignore_changes = [scaling_config[0].desired_size]
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.ec2_container_register_readonly,
  ]

  tags = merge(
    local.tags,
  )
}

我没有为 EBS 定义卷类型,可能它使用的是默认设置。如何解决这个问题?

针对VPC的IP地址不足问题,如果新建子网供EKS使用,是否需要删除EKS集群或节点组?

顺便说一句,我使用的部署是https://raw.githubusercontent.com/ansible/awx-operator/0.13.0/deploy/awx-operator.yaml。 安装使用 https://github.com/ansible/awx-operator#basic-install.

@miantian,继续我们的评论讨论:

不能只增加子网大小。如果您更改子网大小,它将被重新创建。但由于 EKS 存在,子网创建将失败。所以,我会说——重新开始。删除所有内容,然后重新开始。

注册音量问题,默认EKS只支持ReadWriteOnce访问方式。这是因为 AWS 的技术限制,EBS 卷只能附加到 1 个 EC2 实例。如果要使用ReadWriteMany访问方式,需要使用EFS。

如果您想使用 EFS,请查找 NFS/EFS EKS 的客户端配置程序。要在 EKS 中创建 EFS 供应商,您需要执行几个步骤。然后,就可以开始使用ReadWriteMany访问模式了。