Listening Tentacles dying without explanation

mhenry's Avatar

mhenry

12 Jan, 2018 03:29 PM

We have a number of tentacles that are polling tentacles that never have any problems. We also have a number of QA servers running listening tentacles. Once a week or so, the tentacles on one of the servers will die. When they do this, they show as being "unhealthy" in the Infrastructure->Environments. However, there's no event that shows that they ever went down. There's nothing in the Octopus log file on the server running the tentacle that indicates it went down or why, and nothing in the Windows event viewer either.

Attached are some screenshots. You can see that qauaw02 is "unavailable". There's no event showing it ever went down.
There's nothing in the audit log that indicates qauaw02 went down either.

I should say that these QA servers are virtual machines so it's possible that our IT team is pausing these VM's and bringing them out of hibernation as they shuffle things around so that may have something to do with this. Still, Octopus obviously realizes that the tentacles are unavailable, why isn't it creating an event when it happens?

A. Can you fix so that Octopus realizes when the servers go down so I can get an email notification?
B. Even better if they stop going down.

  1. Support Staff 1 Posted by Daniel Fischer on 14 Jan, 2018 11:54 PM

    Daniel Fischer's Avatar

    Hi,

    Thanks or getting in touch! To understand why your Tentacle are changing state on your server I will need you to attach a copy of your recent Octopus server logs. These can be found under C:\Octopus\Logs on a default installation of your Octopus server. This log contains information on the Octopus server's communication with the Tentacle, and any errors that may be there.

    As for a notification, you should be able to use our Subscriptions feature. This feature will let you to subscribe to events in Octopus, such as machine becomes unavailable and react accordingly. We currently support both email and webhooks as reactions.

    Let me know if this help. :)

    Looking forward to hearing from you.

    Best regards,
    Daniel

  2. 2 Posted by mhenry on 16 Jan, 2018 10:00 PM

    mhenry's Avatar

    Attached is an example of the logs for a tentacle that went down inexplicably. Subscriptions don't work because Octopus is not generating an event when the tentacle goes down.

  3. Support Staff 3 Posted by Daniel Fischer on 17 Jan, 2018 06:35 AM

    Daniel Fischer's Avatar

    Hi,

    Thanks for the update! There are a few things that could be going on here and it will help to get a little bit more specific information from you end.

    Would you be able to check the connectivity tab on a Tentacle which is experiencing this behavior? Please attach any errors or messages displayed here.

    It looks like Octopus is not running a health check on these machines, could it be possible that the health check is not enabled for them? (This would be under the machine policy)

    Could you try manually running a health check on a Tentacle with this issue? This can be done under the connectivity tab. After which, if it has failed, can you attach the relevant talk log for the health check?
    If the health check succeeds, can you check your Tentacles events tab and confirm that we are logging the event?

    Hopefully the above information should provide us with enough information to troubleshoot this problem. :)

    Looking forward to hearing from you and getting to the bottom of this.

    Best regards,
    Daniel

  4. 4 Posted by mhenry on 19 Jan, 2018 09:59 PM

    mhenry's Avatar

    it is running the healthcheck, and showing it's failing.
    (it fails because the tentacle is not running - to be clear, I could easily restart the tentacle and have it pass the healthcheck. Just trying to find out why it's dying without explanation and also not registering an event or sending a notification when it dies).

    January 19th 2018 10:52:08Info
    Opening a new connection
    January 19th 2018 10:52:29Error
    Unexpected exception executing transaction. Halibut.HalibutClientException: The client was unable to establish the initial connection within 00:01:00
    at Halibut.Transport.TcpClientExtensions.ConnectWithTimeout(TcpClient client, String host, Int32 port, TimeSpan timeout) at Halibut.Transport.TcpClientExtensions.ConnectWithTimeout(TcpClient client, Uri remoteUri, TimeSpan timeout) at Halibut.Transport.SecureClient.CreateConnectedTcpClient(ServiceEndPoint endPoint) at Halibut.Transport.SecureClient.EstablishNewConnection() at Halibut.Transport.SecureClient.ExecuteTransaction(Action`1 protocolHandler)

  5. Support Staff 5 Posted by Daniel Fischer on 23 Jan, 2018 04:11 AM

    Daniel Fischer's Avatar

    Hi,

    Thanks for clarifying that, however I'll still need some additional information. Could you attach your Octopus Server logs which cover this issue? (C:\Octopus\Logs in a default installation)

    If a health check runs on machine QAUAW02 and fails, it should return a failed health check to the audit log, that not happening is really strange. It is possible that pausing the VM could be causing this issue, though we should still log an event for the failed health check.

    Can you check to see if an event is logged when the service is manually stopped?
    Can you also try restarting the Tentacle service and running a health check, then selecting the Events tab for the machine to see if there is a change? (By the sounds of it there will not be)

    Are you able to confirm whether the Tentacle drops off like this just after the VM is paused/resumed?
    If you could provide me with a times stamp of such an event for me to cross reference in the logs that would also be very helpful.

    I'm hoping to see any errors in the Octopus server logs around the time that this happens, these logs will have some more helpful information than the Tentacle logs.

    Looking forward to hearing from you.

    Best regards,
    Daniel

  6. 6 Posted by mhenry on 26 Jan, 2018 01:39 AM

    mhenry's Avatar

    I just had this happen again today. Note the events show the server (QAAlly2) going online, but never going offline.

    Attached is a screenshot and the server logs:

  7. 7 Posted by mhenry on 26 Jan, 2018 01:50 AM

    mhenry's Avatar

    I did see this error message pop up on the server. Oddly enough, it was at a time that the tentacle was responding normally.

    Failed to restart as an elevated process. The Octopus Tentacle Manager requires elevated privileges on this server. Please check your user account is a member of the Administrator group on this server, and answer 'YES' to any Windows UAC prompts.
    System.Exception
    at Octopus.Manager.Tentacle.Infrastructure.ElevationHelper.Elevate(IEnumerable`1 args) in ElevationHelper.cs:line 29 at Octopus.Manager.Tentacle.App.OnStartup(StartupEventArgs e) in App.xaml.cs:line 33 at System.Windows.Application.<.ctor>b__1_0(Object unused) at System.Windows.Threading.ExceptionWrapper.InternalRealCall(Delegate callback, Object args, Int32 numArgs) at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(Object source, Delegate callback, Object args, Int32 numArgs, Delegate catchHandler)

    --Inner Exception-- The operation was canceled by the user
    System.ComponentModel.Win32Exception
    at System.Diagnostics.Process.StartWithShellExecuteEx(ProcessStartInfo startInfo) at Octopus.Manager.Tentacle.Infrastructure.ElevationHelper.Elevate(IEnumerable`1 args) in ElevationHelper.cs:line 25

  8. 8 Posted by mhenry on 26 Jan, 2018 02:55 PM

    mhenry's Avatar

    I ran an experiment. I verified the server was listed as healthy than shut down the tentacle and waited for the next healthcheck. I, correctly, see the event that the tentacle went down. Then I restarted the tentacle, verified it was healthy, paused the virtual machine, and waited for the next health check. Again, I got the event that the tentacle was down (see attached screenshot).

  9. 9 Posted by mhenry on 01 Feb, 2018 05:04 PM

    mhenry's Avatar

    This just happened to me on another server. This happens to me several times a week, on multiple servers running different versions of Windows.

    Nothing interesting or useful in either the tentacle log or the Octopus server log.

    I do see this in the System events in Event Viewer:
    The OctopusDeploy Tentacle service failed to start due to the following error:
    The service did not respond to the start or control request in a timely fashion.
    A timeout was reached (30000 milliseconds) while waiting for the OctopusDeploy Tentacle service to connect.

    So sounds like it died, Windows tried to restart it but it wouldn't come up? When I restarted it by hand it started up no problem.

    is there maybe a tentacle build you can give me with more debugging to try to track this down?

  10. 10 Posted by mhenry on 01 Feb, 2018 05:21 PM

    mhenry's Avatar

    This particular error seems to have happened right around the time of a server reboot. I changed the service from Automatic startup to Delayed startup, we'll see if that helps.

  11. Support Staff 11 Posted by Daniel Fischer on 02 Feb, 2018 01:32 AM

    Daniel Fischer's Avatar

    Hi,

    Thanks for all this information. I'm sorry for the delay in getting back to you. Would you be able to let me know what user the account running the Tentacle service is? The UAC indicates that the user running the Tentacle account may be running into some kind of access issue. It is strange that its temperamental though. The following documentation has a small list of the permissions we expect the account to have. https://octopus.com/docs/infrastructure/windows-targets/running-ten...

    Also, let me know if the Delayed startup has any effect. I'll run this past our developers and see if there are any ideas.

    Looking forward to hearing from you.

    Best regards,
    Daniel

  12. 12 Posted by mhenry on 04 Feb, 2018 07:39 PM

    mhenry's Avatar

    It says "Local System Account"

  13. 13 Posted by posty on 05 Feb, 2018 02:24 PM

    posty's Avatar

    I have the same issues here and there. Our test machines are in the cloud and to save money we have to shut them down every night. When they come back up in the morning, some services don't start and it is random what machines and how often.

    We use an account that is a local admin on the machine.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Already uploaded files

  • tentacleunavailable.png 18.7 KB
  • NothingInAuditLog.PNG 39.9 KB
  • NoEvent.PNG 12.3 KB

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac