Terraform 在每个导致停机的应用上为 Lambda 重新创建 API 权限(lambda 模块、无服务器框架、VPC)

Terraform recreates API permissions for Lambda on each apply causing downtime (lambda module, serverless framework, VPC)

我有一个通过 terraform aws lambda module 创建的 Lambda。它指向版本化的 Lambda,因为我使用保留并发。它也驻留在 VPC 中。

配置如下所示:

module "my-lambda" {
  source  = "terraform-aws-modules/lambda/aws"
  version = "~> v1.45.0"

  function_name         = "${local.lambda_name}"
  description           = local.lambda_name
  handler               = "handler.handler"
  runtime               = "python3.8"
  hash_extra            = local.lambda_name
  attach_tracing_policy = true
  tracing_mode          = "Active"
  publish               = true
  vpc_security_group_ids = [
// required VPC security groups
  ]
  vpc_subnet_ids = var.private_subnet_ids
  source_path = [
    // ... abriged
  ]

  build_in_docker                           = true
  provisioned_concurrent_executions         = var.provisioned_concurrency_lambdas
  create_current_version_allowed_triggers   = true
  create_unqualified_alias_allowed_triggers = false

  allowed_triggers = {
    APIGateway = {
      service    = "apigateway"
      source_arn = "${module.my_api_gateway.this_apigatewayv2_api_execution_arn}/*"
    }
  }

  attach_policies = true
  policies = [
    // policies needed for a VPC lambda
  ]
}

我发现在 terraform plan 中,即使我不做任何更改并重复发出 terraform plan,也会发生这种替换 - 这会导致重新创建 API 网关权限和本质上是一个小的停机时间:

  # module.my_entire_api.module.my-lambda.aws_lambda_permission.current_version_triggers["APIGateway"] must be replaced
-/+ resource "aws_lambda_permission" "current_version_triggers" {
      ~ id            = "APIGateway" -> (known after apply)
      ~ qualifier     = "1" -> (known after apply) # forces replacement
        # (5 unchanged attributes hidden)
    }

  #  module.my_entire_api.module.my-lambda.aws_lambda_provisioned_concurrency_config.current_version[0] must be replaced
-/+ resource "aws_lambda_provisioned_concurrency_config" "current_version" {
      ~ id                                = "env-my-lambda:1" -> (known after apply)
      ~ qualifier                         = "1" -> (known after apply) # forces replacement
        # (2 unchanged attributes hidden)
    }

还有一些其他 Lambda 在 VPC 中 运行 没有。目前我没有看到这些效果,虽然我不完全确定它永远不会发生。

可以肯定的是,我不关心并发配置,因为重新创建它不会导致停机。但是我想配置模块使得 aws_lambda_permission 不会被重新创建。我怎么可能那样做?

terraform-provider-aws 中的一个问题:terraform-provider-aws 3.13.0 and later including 3.25.0 cause lambdas in a VPC to be updated on every apply #17385


来自文档How to deploy and manage Lambda Functions?

publish               = true

Typically, Lambda Function resource updates when source code changes. If publish = true is specified a new Lambda Function version will also be created.

publish flag

variable "publish" {
  description = "Whether to publish creation/change as new Lambda Function Version."
  type        = bool
  default     = false
}

aws_lambda_permission

resource "aws_lambda_permission" "current_version_triggers" {
  for_each = var.create && var.create_function && !var.create_layer && var.create_current_version_allowed_triggers ? var.allowed_triggers : {}

  function_name = aws_lambda_function.this[0].function_name
  qualifier     = aws_lambda_function.this[0].version

所以每次部署都会部署一个新版本,在相应的资源中引用它来更新策略。因此它每次都触发更新。

Depending on where you are deriving your context for deploy and publish, normally deploy means redeploying your lambda with new code whereas publish is increasing your lambda version (not redeploying code).

我面临的问题归结为几件事。

  • 当您执行预置并发时,您必须“发布”您的 lambda,以便它们具有适当的版本限定符(类似于“1”而不是 $LATEST),因此允许网关调用 Lambda 的 Lambda 权限是绑定到特定的 Lambda 版本。当您制作另一个版本时,这些权限将被销毁并为新的 Lambda 版本重新创建。 create_before_destroy 生命周期标志可能会有所帮助。在没有更改的情况下,我还没有看到为非 VPC lambdas 重新创建这些;更改 Lambda 后,删除和重新创建 Lambda 内部为 API 网关保留的并发和权限之间有几分钟的时间。
  • 此外,即使 Lambda 未更改、Terraform 错误 https://github.com/hashicorp/terraform-provider-aws/issues/17385.
  • ,VPC Lambdas 也会重新体验并发性和权限

解决方案似乎根本不处理 Lambda 的权限,而是提供 API 网关“凭据”(也称为具有 Lambda InvokeFunction 权限的角色)以允许其调用 Lambda。这样,当调用 AWS 网关“集成”(= Lambda)时,它会承担角色。在这种情况下,不需要 Lambda 端的权限。我的测试表明,在这种情况下,更新 Lambda 的顺序是正确的:没有为 VPC lambda 重新创建不必要的资源,并且在更新 Lambda 时,首先部署一个新版本,然后 API 网关转移到它(因此,不会发生停机)。特定负载下的生产测试也证实我们在实践中没有看到中断。

这是允许 Lambda 调用的 API 网关配置的片段。它遵循在 https://medium.com/@jun711.g/aws-api-gateway-invoke-lambda-function-permission-6c6834f14b61.

找到的食谱

resource "aws_iam_role" "api_gateway_credentials_call_lambda" {
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Service = "lambda.amazonaws.com"
        },
        Action = "sts:AssumeRole"
      },
      {
        Effect = "Allow",
        Principal = {
          Service = "apigateway.amazonaws.com"
        },
        Action = "sts:AssumeRole"
      }
    ]
  })
  inline_policy {
    name = "permission-apigw-lambda-invokefunction"
    policy = jsonencode({
      "Version" : "2012-10-17",
      "Statement" : [
        {
          Effect   = "Allow",
          Action   = "lambda:InvokeFunction",
          Resource = "arn:aws:lambda:*:${data.aws_caller_identity.current.account_id}:function:*"
        }
      ]
    })
  }
}

请注意,最后一个 Resource = 指令将允许此角色调用所有 Lambda。您可能希望将这些权限限制为 lambda 子集,以提高安全性并减少人为错误。

设置此角色后,我使用流行的模块 apigateway-v2 from serverless.tf framework:

配置 API 网关
module "api_gateway" {
  source  = "terraform-aws-modules/apigateway-v2/aws"
  version = "~> 0.14.0"

  # various parameters ...

  # Routes and integrations
  integrations = {

    "GET /myLambda" = {
      integration_type        = "AWS_PROXY"
      integration_http_method = "POST"
      payload_format_version  = "2.0"
      lambda_arn              = my_lambda_qualified_arn
      # This line enables the permissions:
      credentials_arn         = aws_iam_role.api_gateway_credentials_call_lambda.arn
    }