Calamari's WindowsService_BeforePostDeploy.ps1 script freezing on Get-WMIObject() call

adam_schueller's Avatar

adam_schueller

19 Apr, 2017 10:58 PM

Hey guys,

We're seeing the issue in the attached screenshot happen a lot on our deployments lately. Basically, the deployment freezes up on any step that's using the built-in Windows service install feature, right between configuring and starting the service. It can take anywhere from 30 mins to an hour or more to complete -- normally the entire deployment takes <5 mins!

Looking at the source for the script (https://github.com/OctopusDeploy/Calamari/blob/master/source/Calamari/Scripts/Octopus.Features.WindowsService_BeforePostDeploy.ps1#L145-L146), there's only 2 lines that occur between those two outputs:

$wmiServiceName = $serviceName -replace "'", "\'"
$status = Get-WMIObject win32_service -filter ("name='" + $wmiServiceName + "'") -computer "." | select -expand startMode

Googled around a bit, and it seems that these kinds of WMI queries can freeze up like this quite often (for whatever reason). The recommended approach is usually to wrap the call in a task with a timeout (since that query doesn't support a timeout directly).

Has anyone experienced similar issues? Or have any recommendations for fixing this? Right now is causing major heartburn getting bugfixes from DEV --> QA, so our devops team is looking to create a custom script to replace the built-in script... but we'd prefer to keep using the built-in features as much as possible (they're very easy to configure!).

Thanks,
-Adam

  1. Support Staff 1 Posted by Robert Wagner on 20 Apr, 2017 05:12 AM

    Robert Wagner's Avatar

    Hi Adam,

    Thanks for getting in touch. When this does happen again, would you be able to log onto the machine and run Get-WMIObject win32_service and see if that hangs too (to determine whether it is random, or point in time error). Also if it does hang, could you run Get-Service. Perhaps the output of those will show where it is hanging (ie print everything until the service that is causing the hang).

    I'm happy to look into adding a timeout to that call. I'm thinking 5 minutes?

    In the meantime, you can update the script yourself, name it Octopus.Features.WindowsService_BeforePostDeploy.ps1 and add it to root of the package. Octopus will then use that script instead of the built in script.

    Robert W

  2. 2 Posted by adam_schueller on 21 Apr, 2017 04:07 PM

    adam_schueller's Avatar

    Thanks Robert! Following your advice, it seems to be a systemic problem on that box -- every time I run "Get-WMIObject win32_service", it hangs right after the FDResPub service. The command finishes in ~1s on other machines.

    Looking at the list of services using Get-Service shows that the next service in line is "FontCache" (though I did notice the WMI query doesn't match the output of Get-Service exactly). Not sure if that means anything -- I checked the WMI repository with "winmgmt /verifyrepository" and it comes back "WMI repository is consistent".

    It hangs pretty much forever (I've been waiting for 4+ hours, still stuck). Seems to be same problem as this post: http://www.muscetta.com/2009/05/27/get-wmicustom/

    That's from 2009, so clearly not a new issue with WMI in general.

    I ran WMIDiag and the output was all good: "(0) ** SUCCESS: WMIDiag determined that WMI works CORRECTLY.", so not sure what our issue is.

    At this point, we clearly need to address the WMI issues on our DEV box. A timeout on you guy's end would just mask the issue we're experiencing... and even a 5 min timeout would essentially double the deploy time.

    Because we always auto-start, we're considering the option of a custom script, but I want to see if we can solve the WMI issue before adding anything to the source repo.

    Thanks again for the quick response!

    -Adam

  3. Support Staff 3 Posted by Robert Wagner on 23 Apr, 2017 09:51 PM

    Robert Wagner's Avatar

    Hi Adam,

    Thanks for the update, hope you get it sorted.

    Robert W

  4. 4 Posted by adam_schueller on 27 Apr, 2017 11:39 PM

    adam_schueller's Avatar

    Hey Robert,

    Final outcome: we managed to fix the WMI issues on the box. Not entirely sure what solved it -- we rebuilt the WMI repository, tweaked some other settings, and finally rebooted the machine (of course) and the query started working again.

    With the 'Get-WMIObject win32_service' call working as expected, and the deployments are back to their snappy selves. :)

    In the future we'll know to check the health of WMI on the target boxes when we see this behavior again.

    I appreciate the help; we've got some new debugging tools under our belts now.

    Thanks,
    -Adam

  5. Paul Stovell closed this discussion on 02 Aug, 2017 09:37 AM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac