Context deadline exceeded on VMs with large disks #1134

morganhowarth-fd · 2020-07-13T10:10:42Z

Terraform Version

0.12.28

vSphere Provider Version

v1.18.1

Affected Resource(s)

vsphere_virtual_machine

Terraform Configuration Files

https://gist.github.com/morganhowarth-fd/aed994cf9c2ff1c155deb86e02ce2104

Expected Behavior

Terraform should be able to create the VM with a large disk without failing.

Actual Behavior

We have multiple database servers with 200GB thick-provisioned eager-zeroed secondary disks which deploy fine, and a few with 400GB thick-provisioned eager-zeroed secondary disks which have an issue as described below.

When creating a server from a template with a 400GB+ thick-provisioned eager-zeroed disk it takes some time and after exactly after 5 minutes whilst the template is still cloning another VMware job appears which deletes the virtual machine and Terraform fails with a message:

There was an error performing post-clone changes to virtual machine "foo": error reconfiguring virtual machine: Post https://VCENTER_SERVER/sdk: context deadline exceeded

It looks like the exact same issue as #641 which apparently was fixed by #792 but we still have the issue.

I've tried setting wait_for_guest_net_timeout to a higher value, the same with vim_keep_alive but to no avail.

A workaround for us, is to deploy the server with a smaller thick-provisioned disk then increase it to the desired size.

Steps to Reproduce

YMMV, you may need a larger disk if your clone job completes faster than 5 minutes.

terraform apply with the TF config for a server with a large (400GB) thick-provisioned eager-zeroed secondary disk.
Wait 5 minutes.
Observe vCenter with a task in the queue to delete the virtual machine you're deploying.
Watch TF fail after the clone job completes with

There was an error performing post-clone changes to virtual machine "foo": error reconfiguring virtual machine: Post https://VCENTER_SERVER/sdk: context deadline exceeded

Important Factoids

We're running ESXI/vCenter 6.7 on hyper-converged infrastructure. Large storage writes especially with thick-provisoned disk eager-zeroed disks takes a while due to storage replication across the nodes.

References

Similar issue reported - Thick provisioning with multiple disks fails : error reconfiguring virtual machine: Post https://x/sdk: context deadline exceeded #641
Apparent fix for the issue - r/virtual_machine: Add VIM session keep alive #792

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

gosarami · 2021-01-27T07:43:10Z

I am experiencing the same problem.

From my research, I expect it's probably because this plugin always adopting provider.DefaultAPITimeout in the context section of the following code that customizes the VM (Sorry if I'm wrong).

terraform-provider-vsphere/vsphere/resource_vsphere_virtual_machine.go

Line 1673 in 66543c5

if err := virtualmachine.Customize(vm, custSpec); err != nil {

terraform-provider-vsphere/vsphere/internal/helper/virtualmachine/virtual_machine_helper.go

Line 658 in 66543c5

    
           ctx, cancel := context.WithTimeout(context.Background(), provider.DefaultAPITimeout)

To solve this, I think it needs to be modified to allow passing a timeout parameter such as api_timeout as an argument.

Please let me know your opinion.

greeneg · 2021-04-14T18:12:03Z

Thisi s also accuring on 0.13.5.

We have VMs that by policy need to be built with thck, eager zeroed VMDKs. When a Windows SQL instance is being built, we have a number of disks added, along with the main OS drive, which takes a while for vCenter to build out. On average, most of the SQL instances around 3.5TB of disk overall, which can take a fair amount of time to complete.

For now, we're building the instances as thin provisioned to work around this, however, this is a violation of our internal policy that we would like resolved at the provisioning stage.

Panplumousse · 2021-04-15T16:25:38Z

hello,
We use 1.25.0 of provider vsphere version and we experiment exactly the same problem
After 5 minutes of reconfigure virtual machine , one more job vsphere apear and delete vm

terraform version:
0.11.11
we had api_timeout option in provider vsphere , but no change

with error >
There was an error performing post-clone changes to virtual machine "foo": error reconfiguring virtual machine: Post https://VCENTER_SERVER/sdk: context deadline exceeded

greeneg · 2021-05-11T15:45:12Z

This bug is impacting the following issue entries:

#1238
#1335
#641
#790
#1401
#1387

KenzoB73 · 2021-08-09T16:57:43Z

This issue is over a year old, is there any update on this?

CollinLeishman · 2021-12-01T18:03:33Z

This is also affecting me. Any help or update on this would be very much appreciated!

CollinLeishman · 2021-12-01T18:03:47Z

@KenzoB73 Is this still affecting you?

KenzoB73 · 2021-12-01T22:23:35Z

Yes, happened last week again actually. Sent from my iPhone.

…

On Dec 1, 2021, at 1:03 PM, collin leishman ***@***.***> wrote: @KenzoB73 Is this still affecting you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

tenthirtyam · 2021-12-02T01:48:19Z

v2.0.0 added the api_timeout to the provider configuration via #1405.

api_timeout - (Optional) Sets the number of minutes to wait for operations to complete. The default timeout is 5 minutes. Currently it will override the timeout for all VM creation operations.

Example:

terraform {
  required_providers {
    vsphere = {
      source  = "hashicorp/vsphere"
      version = ">= 2.0.0"
    }
  }
  required_version = ">= 1.0.0"
}

provider "vsphere" {
  vsphere_server       = "sfo-m01-vc01.rainpole.io"
  user                 = "svc-terraform-vsphere@rainpole.io"
  password             = "***********”
  allow_unverified_ssl = false
  api_timeout          = 30 // Example. Default 5.
}

@morganhowarth-fd - have you tried this version or higher with this provider configuration.

Ryan

tenthirtyam · 2022-02-05T02:26:58Z

Resolved in #1405 with the introduction of the defaultAPITimeout configuration for the provider.

Marking this issue as closed. If this issue is reappears with the latest version of the provider, please create a new issue linking back to this one for added context.

Ryan Johnson
Staff II Solutions Architect
Cloud Infrastructure Business Group, VMware

github-actions · 2022-03-08T02:14:24Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

morganhowarth-fd added the bug Type: Bug label Jul 13, 2020

bill-rich added size/s Relative Sizing: Small acknowledged Status: Issue or Pull Request Acknowledged labels Jul 14, 2020

JmLallem mentioned this issue May 4, 2021

Help to configure api_timeout on vsphere provider #1387

Closed

This was referenced Feb 5, 2022

Context Deadline reached with building VM with large disks #1415

Closed

Timeout while adding big extra disk (5T) to VM #1335

Closed

tenthirtyam closed this as completed Feb 5, 2022

github-actions bot locked as resolved and limited conversation to collaborators Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context deadline exceeded on VMs with large disks #1134

Context deadline exceeded on VMs with large disks #1134

morganhowarth-fd commented Jul 13, 2020

gosarami commented Jan 27, 2021 •

edited

Loading

greeneg commented Apr 14, 2021

Panplumousse commented Apr 15, 2021 •

edited

Loading

greeneg commented May 11, 2021

KenzoB73 commented Aug 9, 2021

CollinLeishman commented Dec 1, 2021

CollinLeishman commented Dec 1, 2021

KenzoB73 commented Dec 1, 2021 via email

tenthirtyam commented Dec 2, 2021 •

edited

Loading

tenthirtyam commented Feb 5, 2022

github-actions bot commented Mar 8, 2022

Context deadline exceeded on VMs with large disks #1134

Context deadline exceeded on VMs with large disks #1134

Comments

morganhowarth-fd commented Jul 13, 2020

Terraform Version

vSphere Provider Version

Affected Resource(s)

Terraform Configuration Files

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

Community Note

gosarami commented Jan 27, 2021 • edited Loading

greeneg commented Apr 14, 2021

Panplumousse commented Apr 15, 2021 • edited Loading

greeneg commented May 11, 2021

KenzoB73 commented Aug 9, 2021

CollinLeishman commented Dec 1, 2021

CollinLeishman commented Dec 1, 2021

KenzoB73 commented Dec 1, 2021 via email

tenthirtyam commented Dec 2, 2021 • edited Loading

tenthirtyam commented Feb 5, 2022

github-actions bot commented Mar 8, 2022

gosarami commented Jan 27, 2021 •

edited

Loading

Panplumousse commented Apr 15, 2021 •

edited

Loading

tenthirtyam commented Dec 2, 2021 •

edited

Loading