Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Startup script solution gets stuck in infinite loop #215

Open
tpdownes opened this issue Apr 9, 2024 · 1 comment
Open

Startup script solution gets stuck in infinite loop #215

tpdownes opened this issue Apr 9, 2024 · 1 comment

Comments

@tpdownes
Copy link

tpdownes commented Apr 9, 2024

Overview of the Issue

If the Packer VM is:

  • configured to use startup-script metadata to perform customization
  • given a service account that does not have the power to modify its own metadata

then the packer process gets stuck in an infinite loop. The guidance to the user is not very informative. My thoughts:

  1. modify retry.Config to put a limit on the number of Tries or StartTimeout
  2. Improve the guidance to the user at "Error getting startup script status" to help them understand that the service account probably needs the permission to modify its own instance metadata
  3. Whatever process attempts to update the instance metadata should probably have a retry mechanism

These could be done separately. 1 and 2 are probably obvious. The reasoning behind 3 may not be. If you create a service account on Google Cloud and assign it IAM roles, those roles are not immediately applied but have a known propagation delay. Thus an automation pipeline might create the service account, assign it adequate permissions, and nevertheless Packer might fail.

Each timeout might reasonably be 10 minutes to account for worst case propagation delay.

Reproduction Steps

Begin by creating a service account without any IAM roles:

gcloud iam service-accounts create failure \
                                   --description="SA" \
                                   --display-name="failure"

Then supply that project_id and service account to the template below.

Plugin and Packer version

  • Packer v1.10.2
  • Plugin latest as shown below

Simplified Packer Buildfile

source "googlecompute" "toolkit_image" {
  project_id            = var.project_id
  communicator          = "none"
  image_name            = "repro-fail"
  machine_type          = "n2-standard-8"
  disk_size             = 32
  disk_type             = "pd-balanced"
  omit_external_ip      = true
  use_internal_ip       = true
  subnetwork            = "default"
  zone                  = "us-central1-c"
  service_account_email = var.service_account_email
  scopes                = ["https://www.googleapis.com/auth/cloud-platform"]
  source_image_family   = "debian-12"
  metadata = {
    startup-script = <<-EOD
      #!/bin/bash
      /bin/true
      EOD
  }
}

build {
  name    = "test"
  sources = ["sources.googlecompute.toolkit_image"]
}

variable "project_id" {
  description = "Project in which to create VM and image"
  type        = string
}

variable "service_account_email" {
  description = "Service account email address"
  type        = string
}


packer {
  required_version = ">= 1.7.9, < 2.0.0"

  # packer plugin 1.0.16 and above includes HPC VM Image
  required_plugins {
    googlecompute = {
      version = "~> 1.1.0"
      source  = "github.com/hashicorp/googlecompute"
    }
  }
}

Log Fragments and crash.log files

tpdownes@poreef ~/repro> packer build -var project_id=my-project -var service_account_email=failure@my-project.iam.gserviceaccount.com .
test.googlecompute.toolkit_image: output will be in this color.

==> test.googlecompute.toolkit_image: Checking image does not exist...
==> test.googlecompute.toolkit_image: Creating temporary RSA SSH key for instance...
==> test.googlecompute.toolkit_image: no persistent disk to create
==> test.googlecompute.toolkit_image: Using image: debian-12-bookworm-v20240312
==> test.googlecompute.toolkit_image: Creating instance...
    test.googlecompute.toolkit_image: Loading zone: us-central1-c
    test.googlecompute.toolkit_image: Loading machine type: n2-standard-8
    test.googlecompute.toolkit_image: Requesting instance creation...
    test.googlecompute.toolkit_image: Waiting for creation operation to complete...
    test.googlecompute.toolkit_image: Instance has been created!
==> test.googlecompute.toolkit_image: Waiting for the instance to become running...
    test.googlecompute.toolkit_image: IP: 10.128.0.10
==> test.googlecompute.toolkit_image: Waiting for any running startup script to finish...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
Cancelling build after receiving interrupt
    test.googlecompute.toolkit_image: Metadata startup-script-status on instance packer-6615be2c-4509-e09b-a563-a2a3fcc15cf6 not available. Waiting...
==> test.googlecompute.toolkit_image: Error waiting for startup script to finish: Error getting startup script status: Instance metadata key, startup-script-status, not found.
@tpdownes tpdownes added the bug label Apr 9, 2024
@tpdownes tpdownes changed the title Startup script solution gets stuck in loop with infinite timeout Startup script solution gets stuck in infinite loop Apr 9, 2024
@tpdownes
Copy link
Author

Another thought: I believe you can eliminate the need for IAM permissions entirely by modifying and polling VM guest attributes rather than instance metadata.

https://cloud.google.com/compute/docs/metadata/manage-guest-attributes

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants