appAutoScaling 属性与 terraform 中的自动缩放属性有什么区别？

Question

我正在尝试使用 terraform 为 Aurora 扩展 RDS 集群。

我正在设置一个包含 3 个服务器的 RDS 实例 - 1 个写入器和 2 个读取副本。这是我的要求

当任何服务器出现故障时，添加一个新服务器，使副本始终至少有 3 个服务器。
当任何主机的 CPU 使用率超过 50% 时，将新服务器添加到集群。服务器的最大数量为 4。

是否可以创建一个策略，以便在 3 个服务器中的任何一个出现故障时，为该 RDS 实例创建一个新服务器？如果是，如何监控服务器故障？
我需要使用 appAutoScaling 还是使用 autoScaling 或两者？这是符合我的用例的 link ： https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy

Answer 1

我针对您的问题开发了一个 terraform 配置文件 示例。它已准备好使用，但应仅作为学习和测试目的的示例。它在 us-east-1 区域使用带有 terraform 0.13 和 AWS 提供商 3.6 的默认 VPC 进行了测试。

示例 terraform 配置文件创建的关键 资源是：

Public MySQL aurora 集群 1 个写入器和 2 个副本。

Aurora 副本的应用程序 auto-scaling 策略基于 CPU 利用率 (50%)，最小和最大容量分别为 2 和 4。

SNS 主题和订阅该主题的 SQS 队列。使用队列可以轻松查看 SNS 消息，无需配置电子邮件或 lambda。

两个 RDS 事件订阅。一个（例如失败）用于 cluster-level 事件，第二个用于 instance-level 事件。在这两种情况下，事件都会发布到 SNS 主题，然后在 SQS 中可供查看。

下面我将详细介绍所提出的问题和示例配置文件。

Aurora MySQL 具有 1 个写入器和 2 个副本的集群

集群将配备 1 个写入器和 2 个副本。

副本的自动缩放策略

基于 TargetTrackingScaling for RDSReaderAverageCPUUtilization 的 application-auto-scaling。缩放策略基于副本的总体 CPU 利用率 (50%)，而不是其单个副本。

这是一个很好的做法，因为极光副本在连接级别自动负载平衡。这意味着新连接将大致平均分布在可用副本中，前提是您使用 reader enpoint.

此外，一旦副本被缩放 in/out 活动或故障所取代，您可能应用于单个副本的任何警报或缩放策略都将失效。这是因为任何缩放策略都会绑定到特定的数据库实例。一旦实例消失，警报将不起作用。

可以在 CLoudWatch 警报控制台中查看与 AWS 代表您创建的策略关联的警报。

Aurora 数据库实例失败

如果任何数据库实例出现故障，Aurora 将自动继续修复问题，包括重启数据库实例、将只读副本提升为新主实例、重新串接MySQL，或完全替换失败的实例。

您可以自己模拟这些事件，在一定程度上如Testing Amazon Aurora Using Fault Injection Queries 所述。

测试读取副本的故障转移

aws rds failover-db-cluster --db-cluster-identifier aurora-cluster-demo

主实例测试崩溃

这将导致实例自动重启

mysql -h <endpoint> -u root -e "ALTER SYSTEM CRASH INSTANCE;"

reader 实例的测试崩溃

这将导致重新启动 MySQL。

mysql -h <endpoint> -u root -e "ALTER SYSTEM SIMULATE 100 PERCENT READ REPLICA FAILURE TO ALL FOR INTERVAL 10 MINUTE;"

测试 reader

的替换

您可以通过手动删除 reader 实例来模拟完全失败控制台。删除后，Aurora 将自动提供替代。

监控集群故障

您可以使用 Amazon RDS Event Notification 来 自动检测和响应 与您的 Aurora 集群及其实例相关的各种事件。失败是 RDS 事件通知机制捕获的事件之一。

您可以订阅感兴趣的事件类别并接收 SNS 通知。一旦检测到事件并将其发布到 SNS 中，您就可以用它做任何想做的事。例如，调用 lambda 事件来分析事件和 Aurora 集群的当前状态，执行纠正措施或发送电子邮件通知。

例如，当您像之前那样手动强制故障转移时，您会收到一条消息具有以下信息（仅显示片段）：

\"Event Message\":\"Started cross AZ failover to DB instance: aurora-cluster-demo-1\"

及以后：

\"Event Message\":\"Completed failover to DB instance: aurora-cluster-demo-1\"}"

示例 terraform 配置文件订阅了多个类别。因此，您必须 fine-tune 它们完全符合您的要求。您还可以订阅所有这些，并让 lambda 函数 在它们发生时对其进行分析，并决定它们是否应该只存档，或者该函数应该执行一些自动化程序。

AppAutoScaling 或 AutoScaling

Aurora 读取复制是使用 application-auto-scaling 缩放的，而不是 AutoScaling（我在这里假设您指的是 EC2 AutoScaling）。 EC2 AutoScaling 仅用于常规 EC2 实例，不适用于 RDS。

terraform 配置文件示例

provider "aws" {
  # YOUR DATA
  region  = "us-east-1"
}

data "aws_vpc" "default" {
  default = true
}

resource "aws_rds_cluster" "default" {
  cluster_identifier      = "aurora-cluster-demo"
  engine                  = "aurora-mysql"
  engine_version          = "5.7.mysql_aurora.2.03.2"
  database_name           = "myauroradb"
  master_username         = "root"
  master_password         = "bar4343sfdf233"
  vpc_security_group_ids  = [aws_security_group.allow_mysql.id]
  backup_retention_period = 1
  skip_final_snapshot     = true
}

resource "aws_rds_cluster_instance" "cluster_instances" {
  count               = 3
  identifier          = "aurora-cluster-demo-${count.index}"
  cluster_identifier  = aws_rds_cluster.default.id
  instance_class      = "db.t2.small"
  publicly_accessible = true
  engine              = aws_rds_cluster.default.engine
  engine_version      = aws_rds_cluster.default.engine_version
}

resource "aws_security_group" "allow_mysql" {
  name        = "allow_mysql"
  description = "Allow Mysql inbound Internet traffic"
  vpc_id      = data.aws_vpc.default.id

  ingress {
    description = "Mysql poert"
    from_port   = 3306
    to_port     = 3306
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

}

resource "aws_appautoscaling_target" "replicas" {
  service_namespace  = "rds"
  scalable_dimension = "rds:cluster:ReadReplicaCount"
  resource_id        = "cluster:${aws_rds_cluster.default.id}"
  min_capacity       = 2
  max_capacity       = 4
}

resource "aws_appautoscaling_policy" "replicas" {
  name               = "cpu-auto-scaling"  
  service_namespace  = aws_appautoscaling_target.replicas.service_namespace
  scalable_dimension = aws_appautoscaling_target.replicas.scalable_dimension
  resource_id        = aws_appautoscaling_target.replicas.resource_id
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "RDSReaderAverageCPUUtilization"
    }

    target_value       = 50
    scale_in_cooldown  = 300
    scale_out_cooldown = 300
  }
}

resource "aws_sns_topic" "default" {
  name = "rds-events"
}

resource "aws_sqs_queue" "default" {
  name = "aurora-notifications"
}

resource "aws_sns_topic_subscription" "user_updates_sqs_target" {
  topic_arn = aws_sns_topic.default.arn
  protocol  = "sqs"
  endpoint  = aws_sqs_queue.default.arn
}

resource "aws_sqs_queue_policy" "test" {
  queue_url = aws_sqs_queue.default.id
  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Id": "sqspolicy",
  "Statement": [
    {
      "Sid": "First",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "sqs:SendMessage",
      "Resource": "${aws_sqs_queue.default.arn}",
      "Condition": {
        "ArnEquals": {
          "aws:SourceArn": "${aws_sns_topic.default.arn}"
        }
      }
    }
  ]
}
POLICY
}

resource "aws_db_event_subscription" "cluster" {

  name      = "cluster-events"
  sns_topic = aws_sns_topic.default.arn

  source_type = "db-cluster"

  event_categories = [
    "failover",  "failure", "deletion", "notification"
  ]
}


resource "aws_db_event_subscription" "instances" {
  name      = "instances-events"
  sns_topic = aws_sns_topic.default.arn

  source_type = "db-instance"

  event_categories = [
    "availability",
    "deletion",
    "failover",
    "failure",
    "low storage",
    "maintenance",
    "notification",
    "read replica",
    "recovery",
    "restoration",
  ]
}

output "endpoint" {
  value = aws_rds_cluster.default.endpoint
}

output "reader-endpoint" {
  value = aws_rds_cluster.default.reader_endpoint
}

appAutoScaling 属性与 terraform 中的自动缩放属性有什么区别？

What is the difference between appAutoScaling property vs autoscaling property in terraform?

terraform

terraform-provider-aws

Aurora MySQL 具有 1 个写入器和 2 个副本的集群

副本的自动缩放策略

Aurora 数据库实例失败

监控集群故障

AppAutoScaling 或 AutoScaling

terraform 配置文件示例

appAutoScaling 属性 与 terraform 中的自动缩放 属性 有什么区别？

What is the difference between appAutoScaling property vs autoscaling property in terraform?

terraform

terraform-provider-aws

Aurora MySQL 具有 1 个写入器和 2 个副本的集群

副本的自动缩放策略

Aurora 数据库实例失败

监控集群故障

AppAutoScaling 或 AutoScaling

terraform 配置文件示例

appAutoScaling 属性与 terraform 中的自动缩放属性有什么区别？