运行 Azure VM 中使用 terraform 的自定义 shell 脚本

Running a custom shell script in Azure VM using terraform

我一直在努力 运行 Azure VM 中的自定义 shell 脚本。 Shell 命令工作正常,但是当我将它们捆绑到 shell 脚本时它失败了。我在 settings 部分定义了 shell 脚本。

地形代码:

resource "azurerm_resource_group" "test" {
  name     = "acctestrg"
  location = "West US"
}

resource "azurerm_virtual_network" "test" {
  name                = "acctvn"
  address_space       = ["10.0.0.0/16"]
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.test.name}"
}

resource "azurerm_subnet" "test" {
  name                 = "acctsub"
  resource_group_name  = "${azurerm_resource_group.test.name}"
  virtual_network_name = "${azurerm_virtual_network.test.name}"
  address_prefix       = "10.0.2.0/24"
}

resource "azurerm_public_ip" "pubip" {
  name                         = "tom-pip"
  location                     = "${azurerm_resource_group.test.location}"
  resource_group_name          = "${azurerm_resource_group.test.name}"
  public_ip_address_allocation = "Dynamic"
  idle_timeout_in_minutes      = 30

  tags {
    environment = "test"
  }
}

resource "azurerm_network_interface" "test" {
  name                = "acctni"
  location            = "West US"
  resource_group_name = "${azurerm_resource_group.test.name}"

  ip_configuration {
    name                          = "testconfiguration1"
    subnet_id                     = "${azurerm_subnet.test.id}"
    private_ip_address_allocation = "dynamic"
    public_ip_address_id          = "${azurerm_public_ip.pubip.id}"
  }
}

resource "azurerm_storage_account" "test" {
  name                     = "mostor"
  resource_group_name      = "${azurerm_resource_group.test.name}"
  location                 = "westus"
  account_tier             = "Standard"
  account_replication_type = "LRS"

  tags {
    environment = "staging"
  }
}

resource "azurerm_storage_container" "test" {
  name                  = "vhds"
  resource_group_name   = "${azurerm_resource_group.test.name}"
  storage_account_name  = "${azurerm_storage_account.test.name}"
  container_access_type = "private"
}

resource "azurerm_virtual_machine" "test" {
  name                  = "acctvm"
  location              = "West US"
  resource_group_name   = "${azurerm_resource_group.test.name}"
  network_interface_ids = ["${azurerm_network_interface.test.id}"]
  vm_size               = "Standard_A0"

  storage_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }

  storage_os_disk {
    name          = "myosdisk1"
    vhd_uri       = "${azurerm_storage_account.test.primary_blob_endpoint}${azurerm_storage_container.test.name}/myosdisk1.vhd"
    caching       = "ReadWrite"
    create_option = "FromImage"
  }

  os_profile {
    computer_name  = "hostname"
    admin_username = "testadmin"
    admin_password = "Password1234!"
  }

  os_profile_linux_config {
    disable_password_authentication = false
  }

  tags {
    environment = "staging"
  }
}

resource "azurerm_virtual_machine_extension" "test" {
  name                 = "hostname"
  location             = "West US"
  resource_group_name  = "${azurerm_resource_group.test.name}"
  virtual_machine_name = "${azurerm_virtual_machine.test.name}"
  publisher            = "Microsoft.OSTCExtensions"
  type                 = "CustomScriptForLinux"
  type_handler_version = "1.2"

  settings = <<SETTINGS
  {
  "fileUris": ["https://sag.blob.core.windows.net/sagcont/install_nginx_ubuntu.sh"],
    "commandToExecute": "sh install_nginx_ubuntu.sh"
  }
SETTINGS

  tags {
    environment = "Production"
  }
}

我已经从脚本中的命令中删除了任何 sudo,因为 Azure 运行s 所有命令都是 root。 FYR,下面的 shell 脚本:

Shell代码:

#!/bin/bash

echo "Running apt update"
apt-get update
echo "Installing nginx"
apt-get install nginx

我遇到的错误只不过是一条超时消息,如下所示:

错误:

azurerm_virtual_machine.test: Creation complete after 3m21s (ID: /subscriptions/b017dff9-5685-4a83-80d3-...crosoft.Compute/virtualMachines/acctvm)
azurerm_virtual_machine_extension.test: Creating...
  location:             "" => "westus"
  name:                 "" => "hostname"
  publisher:            "" => "Microsoft.OSTCExtensions"
  resource_group_name:  "" => "acctestrg"
  settings:             "" => "  {\n  \"fileUris\": [\"https://sag.blob.core.windows.net/sagcont/install_nginx_ubuntu.sh\"],\n\t\"commandToExecute\": \"sh install_nginx_ubuntu.sh\"\n  }\n"
  tags.%:               "" => "1"
  tags.environment:     "" => "Production"
  type:                 "" => "CustomScriptForLinux"
  type_handler_version: "" => "1.2"
  virtual_machine_name: "" => "acctvm"
azurerm_virtual_machine_extension.test: Still creating... (10s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (20s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (30s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (40s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (50s elapsed)
azurerm_virtual_machine_extension.test: Still creating... (1m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* azurerm_virtual_machine_extension.test: 1 error(s) occurred:

* azurerm_virtual_machine_extension.test: compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with status 'Failed': Code="VMExtensionProvisioningError" Message="VM has reported a failure when processing extension 'hostname'. Error message: \"Malformed status file [ExtensionError] Invalid status/status: failed\"."

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

我可以确认每个人都可以访问该脚本,因为我可以使用 wget 下载它。不知道出了什么问题。 在网上进行了很多挖掘,但最终我到处都发现了一个未解决的错误或问题。此外,Azure 与 Terraform 的可用内容不多。感谢您的帮助!

是的,您的脚本中需要 -y

apt-get install nginx -y

执行Azure自定义脚本扩展时,脚本应该是自动的,不能等待手动输入参数。

在您的脚本中,如果您不添加 -y,脚本将挂起并等待您的输入 yes。 Azure 自定义脚本扩展等待几分钟,然后出现超时错误。

评论更新:

I was unable to find the location where the tar/script will be downloaded. Please can you throw some light here.

脚本的所有执行输出和错误记录到脚本的下载目录/var/lib/waagent//download//,输出的尾部记录到[=]指定的日志目录34=] 并报告给 Azure

扩展的操作日志是/var/log/azure///extension.log文件。

有关此的更多信息,请参阅此 link

看起来问题出在您的脚本中,不在 terraform 文件本身

问题

当您 运行 install_nginx_ubuntu.sh 脚本在 Ubuntu VM 中时,这是盒子上发生的输出(仅显示最后一部分):

0 upgraded, 14 newly installed, 0 to remove and 162 not upgraded.
Need to get 3,000 kB of archives.
After this operation, 9,783 kB of additional disk space will be used.
Do you want to continue? [Y/n]

所以 Terraform 只是等待用户输入,这导致进程超时。

解决方案

解决办法就是自动批准安装linux包,linux用户应该很熟悉。所以在install_nginx_ubuntu.sh

中修改如下
apt-get install nginx -y

在问题之外可能学到的教训

您可能想查看 how to debug Terraform。我觉得如果你至少看到了一些更详细的反馈,那么你就可以解决问题了。