Terraform:为多个实例创建 CloudWatch 警报时出错
Terraform: Error while creating CloudWatch alarm for multiple instances
我正在两个区域创建多个 ec2 实例。我想关联 CloudWatch 警报以进行状态检查和 CPU 利用率。
下面提到了cloudwatch的目录结构和代码,main.tf代表模块的调用
我有 2 个问题,包括创建 cloudwatch 警报的逻辑。
目录结构:
├── main.tf
├── modules
│ ├── alb
│ │ ├── aws_alb.tf
│ │ ├── aws_instance.tf
│ │ ├── bootstrap.sh
│ │ ├── cloudwatch.tf
│ │ ├── main.tf
│ │ ├── output.tf
│ │ ├── security-group.tf
│ │ ├── sns.tf
│ │ └── variables.tf
│ └── route53
│ ├── main.tf
│ └── variables.tf
└── variables.tf
main.tf
module "north-virginia" {
source = "./modules/alb"
region = "us-east-1"
az = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "oregon" {
source = "./modules/alb"
region = "us-west-2"
az = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
modules/alb/aws_instance.tf
resource "aws_instance" "web" {
ami = "${data.aws_ami.amzn2.id}"
instance_type = "${var.instance_type}"
count = 3
availability_zone = "${element(var.az, count.index)}"
tags {
Name = "${count.index}"
}
}
modules/alb/cloudwatch.tf
resource "aws_cloudwatch_metric_alarm" "cpu_utilization" {
count = "${length(local.instance_id_var)}"
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "60"
alarm_description = "This metric monitors ec2 cpu utilization"
dimensions {
InstanceId = "${element(aws_instance.web.*.id, count.index)}"
}
}
resource "aws_cloudwatch_metric_alarm" "status_check" {
count = 3
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "StatusCheckFailed"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "1"
alarm_description = "This metric monitors ec2 status check."
dimensions {
InstanceId = "${element(aws_instance.web.*.id, count.index)}"
}
}
预期行为:
我希望每个实例在每个区域都应附加 2 个以上的警报。
错误行为:
它使用实例在每个区域创建并附加 3 个警报。
- 北弗吉尼亚地区 - 一个用于 CPU 两个用于 StatusCheck。
- 对于俄勒冈地区 - 两个用于 StatusCheck,一个用于 CPU 利用率。
每次我应用它都会创建警报,反之亦然。
我遇到以下错误,如果我在更新警报时等待 2 分钟或者如果我使用 terraform apply -parallelism=1
,该错误将得到解决
错误:
4 error(s) occurred:
* module.north-virginia.aws_cloudwatch_metric_alarm.status_check[0]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.status_check.0: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c4502-dede-11e8-9262-c55251d6673a
* module.north-virginia.aws_cloudwatch_metric_alarm.cpu_utilization[1]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.cpu_utilization.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c6c09-dede-11e8-a13f-bbb86ff53045
* module.oregon.aws_cloudwatch_metric_alarm.status_check[1]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.status_check.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed198a56-dede-11e8-b95a-9d366b9f2e85
* module.oregon.aws_cloudwatch_metric_alarm.cpu_utilization[3]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.cpu_utilization.3: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed193c4d-dede-11e8-9c63-21cde1551122
我在这里遗漏的任何想法或任何约定将不胜感激。
首先,我会通过 removing/commenting 来简化您的测试 module "oregon"
。一旦你得到 virginia 一个正确的,然后重新添加它。
其次,我将切换您模块中的代码以计算计数作为 var.az
的长度。这应该针对您拥有的 3 个资源:aws_instance 和 2 个 CloudWatch 警报。例如:
count = "${length(var.az)}"
这样您就可以在调用模块的代码中更改 AZ 的数量,并动态更改创建的实例的数量。
第三,name
您为 CloudWatch 设置的警报看起来是一样的。尝试区分它们。例如:
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-cpu-util"
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-status-check"
PS> 在测试之间,确保您已清除所有可能已创建的资源以确保您是 运行 一个干净的测试。
我正在两个区域创建多个 ec2 实例。我想关联 CloudWatch 警报以进行状态检查和 CPU 利用率。
下面提到了cloudwatch的目录结构和代码,main.tf代表模块的调用
我有 2 个问题,包括创建 cloudwatch 警报的逻辑。
目录结构:
├── main.tf
├── modules
│ ├── alb
│ │ ├── aws_alb.tf
│ │ ├── aws_instance.tf
│ │ ├── bootstrap.sh
│ │ ├── cloudwatch.tf
│ │ ├── main.tf
│ │ ├── output.tf
│ │ ├── security-group.tf
│ │ ├── sns.tf
│ │ └── variables.tf
│ └── route53
│ ├── main.tf
│ └── variables.tf
└── variables.tf
main.tf
module "north-virginia" {
source = "./modules/alb"
region = "us-east-1"
az = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "oregon" {
source = "./modules/alb"
region = "us-west-2"
az = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
modules/alb/aws_instance.tf
resource "aws_instance" "web" {
ami = "${data.aws_ami.amzn2.id}"
instance_type = "${var.instance_type}"
count = 3
availability_zone = "${element(var.az, count.index)}"
tags {
Name = "${count.index}"
}
}
modules/alb/cloudwatch.tf
resource "aws_cloudwatch_metric_alarm" "cpu_utilization" {
count = "${length(local.instance_id_var)}"
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "60"
alarm_description = "This metric monitors ec2 cpu utilization"
dimensions {
InstanceId = "${element(aws_instance.web.*.id, count.index)}"
}
}
resource "aws_cloudwatch_metric_alarm" "status_check" {
count = 3
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "2"
metric_name = "StatusCheckFailed"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "1"
alarm_description = "This metric monitors ec2 status check."
dimensions {
InstanceId = "${element(aws_instance.web.*.id, count.index)}"
}
}
预期行为: 我希望每个实例在每个区域都应附加 2 个以上的警报。
错误行为: 它使用实例在每个区域创建并附加 3 个警报。
- 北弗吉尼亚地区 - 一个用于 CPU 两个用于 StatusCheck。
- 对于俄勒冈地区 - 两个用于 StatusCheck,一个用于 CPU 利用率。
每次我应用它都会创建警报,反之亦然。
我遇到以下错误,如果我在更新警报时等待 2 分钟或者如果我使用 terraform apply -parallelism=1
错误:
4 error(s) occurred:
* module.north-virginia.aws_cloudwatch_metric_alarm.status_check[0]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.status_check.0: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c4502-dede-11e8-9262-c55251d6673a
* module.north-virginia.aws_cloudwatch_metric_alarm.cpu_utilization[1]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.cpu_utilization.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ea6c6c09-dede-11e8-a13f-bbb86ff53045
* module.oregon.aws_cloudwatch_metric_alarm.status_check[1]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.status_check.1: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed198a56-dede-11e8-b95a-9d366b9f2e85
* module.oregon.aws_cloudwatch_metric_alarm.cpu_utilization[3]: 1 error(s) occurred:
* aws_cloudwatch_metric_alarm.cpu_utilization.3: Creating metric alarm failed: ValidationError: A separate request to update this alarm is in progress.
status code: 400, request id: ed193c4d-dede-11e8-9c63-21cde1551122
我在这里遗漏的任何想法或任何约定将不胜感激。
首先,我会通过 removing/commenting 来简化您的测试 module "oregon"
。一旦你得到 virginia 一个正确的,然后重新添加它。
其次,我将切换您模块中的代码以计算计数作为 var.az
的长度。这应该针对您拥有的 3 个资源:aws_instance 和 2 个 CloudWatch 警报。例如:
count = "${length(var.az)}"
这样您就可以在调用模块的代码中更改 AZ 的数量,并动态更改创建的实例的数量。
第三,name
您为 CloudWatch 设置的警报看起来是一样的。尝试区分它们。例如:
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-cpu-util"
alarm_name = "${element(split(",", join(",", aws_instance.web.*.id)), count.index)}-status-check"
PS> 在测试之间,确保您已清除所有可能已创建的资源以确保您是 运行 一个干净的测试。