TimeoutException on Acquire Packages step of triggered deploy

trevor.elliott's Avatar

trevor.elliott

11 Jan, 2018 10:17 PM

I'm using Octopus v4.1.5 and I'm having errors when triggered deployments to transient targets. I defined my trigger as follows: https://i.imgur.com/5nsqozq.png

My deployment targets are autoscaled Azure virtual machines with the octopus tentacle installed by a PowerShell DSC extension. The regular deployments (which deploy to VMs that presumably have been running for awhile) never have this issue, only the triggered deployments (which are running against VMs that just came online).

Based on this article (https://help.octopusdeploy.com/discussions/problems/46013-large-pac...) I edited the configuration for PollingRequestMaximumMessageProcessingTimeout on the server and restarted the Octopus Server, changing the value to 1 hour. Doing this may have helped because the next triggered deploy succeeded, however the subsequent trigger failed with the same error.

The exact error happens during Acquire Packages, where it succeeds for all deployment targets except for one, which fails while uploading a package with this error:

Server exception:
System.TimeoutException: A request was sent to a polling endpoint, but the polling endpoint did not collect the request within the allowed time (00:02:00), so the request timed out.

I do not want to increase the PollingRequestQueueTimeout because that would mean health checks and deployment times would increase by a lot. Do I have to run health checks as part of the deployment or can I just change the machine policy to do a health check more often (like every 5 mins)?

I would like to use the tentacles in Listen mode but since they are on a VM scaleset and they do not have their own public IPs, it would be difficult to give octopus connectivity. Even if I connected octopus to the Azure vnet for each of my scalesets, I have one vnet per environment so it would be a lot of setup for each environment and the VMs would have to work around IP conflicts.

This really just seems like a bug in the polling mode where the tentacles are not checking in with the octopus server during the acquire packages step sometimes. I checked logs on the VM for on instance where this happened and the Octopus tentacle windows service never stopped or restarted at any point, it was online during the entire time in which the triggered deployment was happening. Also there were no errors in the octopus tentacle logs. And I even checked the packages folder and saw that the packages were downloaded successfully.

  1. Support Staff 1 Posted by Shane Gill on 12 Jan, 2018 12:04 AM

    Shane Gill's Avatar

    Hi Trevor,

    Thanks for getting in touch.

    We have an open issue to resolve the problems with polling Tentacles and package transfers that take a long time: https://github.com/OctopusDeploy/Issues/issues/3034

    Are you automatically triggering multiple deployments to your machines?

    Can you send a task log for a deployment that times out and a Tentacle log for the corresponding time period?

    Can you elaborate on what you mean by:

    Do I have to run health checks as part of the deployment or can I just change the machine policy to do a health check more often (like every 5 mins)?

    Why are you running health checks as a part of the deployment? Are you checking to ensure that the machines that just came online are still healthy?

    Look forward to learning more.

    Cheers,
    Shane

  2. 2 Posted by trevor.elliott on 12 Jan, 2018 12:53 AM

    trevor.elliott's Avatar

    Hi Shane,

    I'm not sure what you mean by triggering multiple deployments, I only have one trigger and one project. A release does get deployed multiple times though. After the initial deployment, a triggered deployment happens each time a VM that's out of date becomes available after being scaled up.

    The file transfer is happening from an Azure VM to another Azure VM in the same Azure datacenter, and the package is only 21MB, so I would be really surprised if it took very long to transfer the package.

    Regarding the health check deployment step, I was going based on this article:

    https://octopus.com/docs/reference/elastic-and-transient-environmen...

    I know the default machine policy specifies health checks every 1 hour. Our VMs scale up and down pretty frequently.

    I want to avoid a situation where the last health check said a VM was available but during the deployment it fails to deploy because it's no longer available. Can you confirm whether or not this would happen?

    I also want to avoid a situation where a VM is running but it hasn't been recognized by a health check yet and therefore doesn't get deployed to and then ends up not being deployed to at all until it gets restarted or the next deploy (which might not happen for awhile).

    The behavior I've noticed so far is that the "Machine becomes available for deployment" event only happens when a VM starts up and the tentacle service starts. Are you able to confirm if that's the case, or do the tentacles periodically check if their release is out of date and, if so, trigger a deployment?

    Is there an email I can use to send those files to you?

  3. Support Staff 3 Posted by Shane Gill on 14 Jan, 2018 10:59 PM

    Shane Gill's Avatar

    Hi Trevor,

    The trigger "Machine becomes available for deployment" gets fired when a machine becomes healthy or is enabled. When a VM starts up, the Tentacle service starts, the Tentacle is registered with the Octopus Server and a successful health check is run on that machine. The events are all server-side, the Tentacle doesn't do any checking to see if a release is out of date.

    A health check is run immediately after a Tentacle is registered with Octopus, so as long as the Tentacle is online when the registration occurs there shouldn't be much lag between the machine coming online and the deployment kicking off.

    If your VMs are scaling up and down frequently I would recommend increasing the frequency of health checks and using auto-deploy (rather than the health check step) to keep your machines up to date. You can ensure that a deployment continues if one of your machines scales down during a deployment by using the "Skip deployment targets" option in Project > Settings. When set to "Skip deployment targets that are or become unavailable", any communication layer errors will be ignored during the deployment.

    Regarding the package transfer issue, it does seem like there is something else at play if you are only transferring 21MB within Azure. You can send the logs to [email blocked], please "Attention:Shane" and provide a link to this support ticket (https://help.octopusdeploy.com/discussions/problems/67808) so it gets to me.

    Thanks,
    Shane

  4. 4 Posted by trevor.elliott on 14 Jan, 2018 11:52 PM

    trevor.elliott's Avatar

    Hi Shane,

    I think I may have found an issue that was causing the timeout issue and that may be fixed. VMs were sometimes being scaled down during a deployment.

    Can you clarify what you mean by using auto-deploy rather than a health check step? Do you mean basically do keep the deployment trigger as-is but just remove the health check step from the deploy?

  5. Support Staff 5 Posted by Shane Gill on 15 Jan, 2018 01:40 AM

    Shane Gill's Avatar

    Hi Trevor,

    Happy to hear you may have uncovered the timeout issue.

    I did mean keep the deployment trigger as-is but remove the health check step. Usually a combination of triggers, the "ignore unavailable" setting and increasing health check frequency is sufficient to handle scale up and down. The health check step would usually be used if the start of the deployment provisioned infrastructure that was used later in the deployment. Or, if the deployment had a long running piece at the start (for example deploying to databases) and using the health check step afterwards to ensure the machines are current.

    I'm guessing your deployment is small and the health check step is adding unnecessary overhead, especially if you are scaling frequently. The trigger should be enough to keep your machines up to date.

    Cheers,
    Shane

  6. 6 Posted by trevor.elliott on 15 Jan, 2018 04:49 PM

    trevor.elliott's Avatar

    Thanks Shane. I'll try that.

  7. 7 Posted by trevor.elliott on 15 Jan, 2018 05:41 PM

    trevor.elliott's Avatar

    Hi Shane,

    I tried disabling the health check step and making sure "Skip unavailable deployment targets" was enabled on the project settings. But now "acquire packages" takes 4 minutes to run because that's how long it's waiting for the polling tentacles. So I guess it's going to take that long no matter what?

  8. Support Staff 8 Posted by Shane Gill on 15 Jan, 2018 11:18 PM

    Shane Gill's Avatar

    Hi Trevor,

    Yes, it looks like the limiting factor is the polling Tentacle timeout. Unfortunately polling Tentacle connectivity isn't very responsive. If you are really keen on tuning the deployment times, you might be able to get away with lowering Halibut.PollingRequestQueueTimeout in Octopus.Server.exe.config to something like 30 seconds. It will affect all polling Tentacles and the trade off is speed v reliability.

    Cheers
    Shane

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac