Terraform:SSH 身份验证失败(用户@:22):ssh:握手失败

Terraform: SSH authentication failed (user@:22): ssh: handshake failed

我写了一些 Terraform 代码来创建一个新的 VM 并想通过 remote-exec 在它上面执行一个命令但是它抛出了一个 SSH 连接错误:

Error: timeout - last error: SSH authentication failed (admin@:22): ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain.

我的 Terraform 代码:​​

# Create a resource group if it doesn’t exist
resource "azurerm_resource_group" "rg" {
  name     = "${var.deployment}-mp-rg"
  location = "${var.azure_environment}"

  tags = {
    environment = "${var.deployment}"
  }
}

# Create virtual network
resource "azurerm_virtual_network" "vnet" {
  name                = "${var.deployment}-mp-vnet"
  address_space       = ["10.0.0.0/16"]
  location            = "${var.azure_environment}"
  resource_group_name = "${azurerm_resource_group.rg.name}"

  tags = {
    environment = "${var.deployment}"
  }
}

# Create subnet
resource "azurerm_subnet" "subnet" {
  name                 = "${var.deployment}-mp-subnet"
  resource_group_name  = "${azurerm_resource_group.rg.name}"
  virtual_network_name = "${azurerm_virtual_network.vnet.name}"
  address_prefix       = "10.0.1.0/24"
}

# Create public IPs
resource "azurerm_public_ip" "publicip" {
  name                = "${var.deployment}-mp-publicip"
  location            = "${var.azure_environment}"
  resource_group_name = "${azurerm_resource_group.rg.name}"
  allocation_method   = "Dynamic"

  tags = {
    environment = "${var.deployment}"
  }
}

# Create Network Security Group and rule
resource "azurerm_network_security_group" "nsg" {
  name                = "${var.deployment}-mp-nsg"
  location            = "${var.azure_environment}"
  resource_group_name = "${azurerm_resource_group.rg.name}"

  security_rule {
    name                       = "SSH"
    priority                   = 1001
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  tags = {
    environment = "${var.deployment}"
  }
}

# Create network interface
resource "azurerm_network_interface" "nic" {
  name                      = "${var.deployment}-mp-nic"
  location                  = "${var.azure_environment}"
  resource_group_name       = "${azurerm_resource_group.rg.name}"
  network_security_group_id = "${azurerm_network_security_group.nsg.id}"

  ip_configuration {
    name                          = "${var.deployment}-mp-nicconfiguration"
    subnet_id                     = "${azurerm_subnet.subnet.id}"
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = "${azurerm_public_ip.publicip.id}"
  }

  tags = {
    environment = "${var.deployment}"
  }
}

# Generate random text for a unique storage account name
resource "random_id" "randomId" {
  keepers = {
    # Generate a new ID only when a new resource group is defined
    resource_group = "${azurerm_resource_group.rg.name}"
  }

  byte_length = 8
}

# Create storage account for boot diagnostics
resource "azurerm_storage_account" "storageaccount" {
  name                     = "diag${random_id.randomId.hex}"
  resource_group_name      = "${azurerm_resource_group.rg.name}"
  location                 = "${var.azure_environment}"
  account_tier             = "Standard"
  account_replication_type = "LRS"

  tags = {
    environment = "${var.deployment}"
  }
}

# Create virtual machine
resource "azurerm_virtual_machine" "vm" {
  name                  = "${var.deployment}-mp-vm"
  location              = "${var.azure_environment}"
  resource_group_name   = "${azurerm_resource_group.rg.name}"
  network_interface_ids = ["${azurerm_network_interface.nic.id}"]
  vm_size               = "Standard_DS1_v2"

  storage_os_disk {
    name              = "${var.deployment}-mp-disk"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Premium_LRS"
  }

  storage_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }

  os_profile {
    computer_name  = "${var.deployment}-mp-ansible"
    admin_username = "${var.ansible_user}"
  }

  os_profile_linux_config {
    disable_password_authentication = true
    ssh_keys {
      path     = "/home/${var.ansible_user}/.ssh/authorized_keys"
      key_data = "${var.public_key}"
    }
  }

  boot_diagnostics {
    enabled     = "true"
    storage_uri = "${azurerm_storage_account.storageaccount.primary_blob_endpoint}"
  }

  tags = {
    environment = "${var.deployment}"
  }
}

resource "null_resource" "ssh_connection" {

  connection {
    host        = "${azurerm_public_ip.publicip.ip_address}"
    type        = "ssh"
    private_key = "${file(var.private_key)}"
    port        = 22
    user        = "${var.ansible_user}"
    agent       = false
    timeout     = "1m"
  }

  provisioner "remote-exec" {
    inline = ["sudo apt-get -qq install python"]
  }
}

我已经尝试使用 admin@xx.xx.xx.xx:22 手动通过 SSH 连接到新的 VM 并且成功了。查看错误消息我然后输出参数 ${azurerm_public_ip.publicip.ip_address} 但它是 null 所以我认为这是SSH身份验证失败的原因但我不知道原因。如果我想通过 Terraform 脚本 SSH 服务器,我该如何修改代码?

你的问题是 Terraform 已经构建了一个依赖关系图,告诉它 null_resource.ssh_connection 的唯一依赖关系是 azurerm_public_ip.publicip 资源,所以它在实例被连接之前开始尝试连接已创建。

这本身并不是什么大问题,因为如果 SSH 尚不可用,供应商通常会尝试重试,但连接详细信息会在空资源启动后立即确定。将 azurerm_public_ip 设置为 Dynamicallocation_method 后,它将在附加到资源之前不会获取其 IP 地址:

Note Dynamic Public IP Addresses aren't allocated until they're assigned to a resource (such as a Virtual Machine or a Load Balancer) by design within Azure - more information is available below.

有几种不同的方法可以解决这个问题。你可以让 null_resource depend on the azurerm_virtual_machine.vm resource via interpolation or via depends_on:

resource "null_resource" "ssh_connection" {

  connection {
    host        = "${azurerm_public_ip.publicip.ip_address}"
    type        = "ssh"
    private_key = "${file(var.private_key)}"
    port        = 22
    user        = "${var.ansible_user}"
    agent       = false
    timeout     = "1m"
  }

  provisioner "remote-exec" {
    inline = [
      "echo ${azurerm_virtual_machine.vm.id}",
      "sudo apt-get -qq install python",
    ]
  }
}

resource "null_resource" "ssh_connection" {
  depends_on = ["azurerm_virtual_machine.vm"]

  connection {
    host        = "${azurerm_public_ip.publicip.ip_address}"
    type        = "ssh"
    private_key = "${file(var.private_key)}"
    port        = 22
    user        = "${var.ansible_user}"
    agent       = false
    timeout     = "1m"
  }

  provisioner "remote-exec" {
    inline = ["sudo apt-get -qq install python"]
  }
}

此处更好的方法是 运行 供应商作为 azurerm_virtual_machine.vm 资源的一部分而不是 null_resource。使用 null_resource 启动供应器的正常原因是当您需要等到资源发生其他事情(例如附加磁盘)或者如果有 not an appropriate resource to attach it to 但这并不适用这里。因此,而不是您现有的 null_resource 您将供应商移动到 azurerm_virtual_machine.vm 资源:

resource "azurerm_virtual_machine" "vm" {
  # ...

  provisioner "remote-exec" {
    connection {
      host        = "${azurerm_public_ip.publicip.ip_address}"
      type        = "ssh"
      private_key = "${file(var.private_key)}"
      port        = 22
      user        = "${var.ansible_user}"
      agent       = false
      timeout     = "1m"
    }

    inline = ["sudo apt-get -qq install python"]
  }
}

对于许多资源,这还允许您使用 self keyword 引用您正在配置的资源的输出。不幸的是,azurerm_virtual_machine 资源似乎无法轻易公开 VM 的 IP 地址,因为这是由 network_interface_ids.

设置的