Server Memory Usage

colin's Avatar

colin

05 Apr, 2017 04:05 PM

Hi,

We recently had to move Octopus to a new VM and during that process we installed v 3.11.X. After running for approx 1 day the server memory usage hit nearly 4GB (the box has 7GB RAM) and was still climbing. The VM eventually became unusable and we had to reboot it. This has happened a few times over the last few days. We upgraded to 3.12.0 as it mentioned some performance improvements but the same thing is still happening. Is this a known issue? How do I go about diagnosing the problem?

Thanks
Colin

  1. Support Staff 1 Posted by Michael Richard... on 06 Apr, 2017 05:16 AM

    Michael Richardson's Avatar

    Hi Colin,

    No, this is not a problem known to us.

    You are correct, we have put a focus on performance work recently, but to be honest it was focused on speed, not memory usage.

    We'll certainly work with you to help resolve this problem. Can I ask you a few questions:

    Which process is consuming the memory? Is it Octopus.Server.exe?

    Could you give us an idea of the size of your usage. How many projects, environments, machines? Approximate numbers are fine.

    Which version of Octopus were you upgrading from?

    If you look at the Tasks tab in the Octopus portal, are there many/any tasks running at the time? How long have they been running for?

    The next time you see the memory climb to a high-level, would you be able to follow the steps on this page (in the 'Get a snapshot from your running Octopus Server' section) to capture a memory snapshot and log files?
    You can upload them to this secure location. Please notify me via this thread if you upload files to the location above, as we don't get notified automatically.

    This information will allow us to investigate further.

    Regards,
    Michael

  2. 2 Posted by colin on 06 Apr, 2017 12:33 PM

    colin's Avatar

    Hi Michael,

    The process is Octopus.Server.exe.

    We currently have 12 projects, 4 environments, 4 machines, but the vast
    majority of our deploys are to Azure.

    We originally had 3.11.11 and it seemed to run uninterrupted fine for weeks.

    No Tasks running just now.

    I tried to get a memory snapshot from it this morning but Octopus had used
    up over 4GB of memory and dotMemory either couldn't start or complete a
    snapshot so I had to restart the Octopus service. I've left it running for
    a few hours now and the memory usage has climbed from ~300MB to about
    ~1.4GB so I've taken a snapshot and have uploaded it to the link you sent.

    Thanks
    Colin

  3. Support Staff 3 Posted by Michael Noonan on 10 Apr, 2017 01:08 AM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! Michael is taking some well earned vacation, so I'll be taking over from him. Nice thing is you don't need to remember a new name! :)

    Thanks so much for sending through that snapshot - nothing seems to be standing out from the single snapshot. What would really help me get to the bottom of this, and the root cause, is to use the alternative approach where you restart Octopus Server with dotMemory recording allocations from the get-go. Then take two snapshots: one after the Web UI starts responding properly, and another once the memory has started to climb, preferably after some normal operation and deployments. This will reveal to us which memory is being retained, and the root cause of it.

    https://octopus.com/docs/how-to/record-a-memory-trace#start-octopus...

    You can upload the zipped workspace export to the same secure file share.

    Hope that helps!
    Mike

  4. 4 Posted by colin on 18 Apr, 2017 12:48 PM

    colin's Avatar

    Hi Michael,

    We're in the middle of a roll-out of our latest release just now so I'll
    try and get you this information tomorrow. For the moment we have the
    service restarting on a nightly basis to prevent this from happening which
    seems to have had the desired effect but is not exactly ideal.

    Cheers
    Colin

  5. Support Staff 5 Posted by Michael Noonan on 18 Apr, 2017 11:44 PM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! I hope your release goes well. :)

    It will be really good to get a reproducible recording of this problem. We don't want anyone to require a routine server restart as part of their Octopus maintenance plan!

    Looking forward to hearing from you soon.
    Mike

  6. 6 Posted by colin on 19 Apr, 2017 03:30 PM

    colin's Avatar

    Hi Michael,

    Release went fine thanks.

    I've uploaded the files to the location you previously gave me. Took a
    while because dotMemory failed to process a snapshot and then crashed.
    Anyway, hopefully whats there now should give you a better idea of whats
    going on.

    Cheers
    Colin

    Colin Winning

    Technical Director

  7. 7 Posted by colin on 25 Apr, 2017 03:31 PM

    colin's Avatar

    Hi,

    I was just wondering if there was any progress on this? Or if I can provide any more information to help you diagnose what is going on?

    Cheers
    Colin

  8. Support Staff 8 Posted by Michael Noonan on 26 Apr, 2017 03:27 AM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! I investigated your profile recording in detail on Friday. The sad part of the story is that I still haven't found out exactly what is causing this problem. The good news is I've identified and proposed fixes for a few other problems. :)

    From what I can tell, the problem is based in unmanaged memory allocations, and dotMemory isn't very advanced in this area. I wonder if you'd mind taking a similar recording with the ANTS Memory Profiler?

    1. When using ANTS, please make sure to select Additional profiling options > Profile unmanaged memory allocations
    2. Take a snapshot once Octopus has stabilised after starting up (might take about 1 min).
    3. Take another snapshot after about 10 mins, or when the memory problem exhibits itself.

    I want to use these two strategies since both could be exhibiting the behaviour you've been seeing:

    You can upload a zip of the files from ANTS to the same secure file share and ping me back so I can analyse it in more detail.

    Hope that helps!
    Mike

  9. 9 Posted by colin on 26 Apr, 2017 11:19 AM

    colin's Avatar

    Hi Michael,

    I've uploaded the results from ANTS Memory Profiler. Hope this helps.

    Cheers
    Colin

    Colin Winning

    Technical Director

  10. Support Staff 10 Posted by Michael Noonan on 26 Apr, 2017 10:34 PM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! I've been analysing those snapshots - it looks like something is using the VC++ redistributable and not cleaning up. The tricky part now is trying to figure out what code is misbehaving which could be several layers deep in code we don't own. This means I need to try and reproduce the behaviour locally and watch to see when that same memory is allocated to narrow down the source.

    You mentioned you do a lot of Azure deployments. Could you give me some details about the types of Azure deployments you're doing, and how you've modelled those deployments in Octopus? For example, it's all Azure Web Apps, using the Azure Web App steps built in to Octopus. Or maybe it's Cloud Services, but using the Cloud Service deployment targets?

    The leak may also be caused by travelling around the UI - I wonder if during the hour of that recording whether you'd have any idea about how people were using the Octopus UI?

    In the meantime I'm going to try and replicate the same memory allocations, and hopefully I get lucky!

    Thanks for working with me on this - I'm sorry we haven't got to the bottom of it yet, but I do feel like we are getting closer thanks to your help reproducing the problem.

    Hope that helps!
    Mike

  11. Support Staff 11 Posted by Michael Noonan on 26 Apr, 2017 11:29 PM

    Michael Noonan's Avatar

    Hi Colin,

    I've had two thoughts since my last post.

    3rd party tools

    Do you have any third-party (non-Octopus) tools running on the Octopus Server? Like a virus scanner or a performance monitor/telemetry recording agent? Some of these will inject themselves into a running process and can cause unmanaged memory leaks. AppDynamics, NewRelic, DynaTrace, Stakify etc.

    Detailed recording to correlate activity

    Regarding my question:

    whether you'd have any idea about how people were using the Octopus UI?

    We have a way to log web requests which I should have had you enable during the recording!

    Would you mind taking a 10 minute recording covering the memory usage issue, with web logging enabled, and take one snapshot every minute over that 10 minute period?

    To correlate everything I would need:

    • The ANTS profiler recording
    • The web request logging
    • The Octopus.Server.log file
    • The Task Log files that were modified during that time (

    In the meantime I'll keep trying to reproduce it locally.

    Hope that helps!
    Mike

  12. 12 Posted by colin on 27 Apr, 2017 08:41 AM

    colin's Avatar

    Hi Michael,

    We mainly do deployments to Azure Cloud Services. A typical workflow is as
    follows:

    [image: Inline images 1]

    Hopefully thats legible enough :-)

    The way we use Octopus is that TC does the builds and on successful build
    of one of about 10 configurations, an Octopus deploy is triggered. This
    uploads the artifacts to Octopus and creates the releases, which in turn
    triggers the deploys from Octopus. During the time between the snapshots
    being taken we probably had a number of these triggered. We don't tend to
    have a huge amount of interaction with the UI - possibly just myself
    looking at some things and creating a new project.

    Don't worry about the time taken to diagnose this - we have a workaround in
    place for now which is fine. Octopus is a great product so we're happy to
    help improve it.

    Cheers
    Colin

    Colin Winning

  13. Support Staff 13 Posted by Michael Noonan on 03 May, 2017 07:22 AM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! I've spent a few days now trying to reproduce the problem using a similar deployment process to an Azure Cloud Service - sadly I can't reproduce the same unmanaged memory allocations.

    Memory Profile

    From these snapshots I can see very normal allocations, where the vast majority of memory reserved by the .NET CLR is reserved free space to make future allocations faster. Also the Octopus Server isn't allocating much memory via the C++ runtime - only 212.4KB is allocated via MSVCR120_CLR0400 compared to ~700MB in the snapshot you sent through.

    I'm at the point where I am struggling to justify spending much more time trying to reproduce this issue in isolation.

    3rd party tools

    I asked last time if you were using any 3rd party tools on your server which could be injecting themselves into the Octopus process:

    Do you have any third-party (non-Octopus) tools running on the Octopus Server? Like a virus scanner or a performance monitor/telemetry recording agent? Some of these will inject themselves into a running process and can cause unmanaged memory leaks. AppDynamics, NewRelic, DynaTrace, Stakify etc.

    Could you clarify whether this is possible or not?

    Packages and variables

    I'm desperately trying to think of other possibilities like:

    • your packages are really really big (mine was 10MB)
    • you have thousands of variables (I used ~7 variables + all the built-in system variables) or some really big ones
    • anything else you can think of that stands out as being big/complex/different?

    Where to from here?

    You might be able to narrow down the exact actions where that memory is allocated by MSVCR120_CLR0400 - whether it's a deployment, another action, or just letting Octopus do its thing. The ANTS profiler documentation describes the process pretty well: http://documentation.red-gate.com/display/AMP8/Checking+unmanaged+m...

    Depending on your responses to my other questions, I wonder if you'd be willing to share a backup of your database with me so I can completely reproduce the scenario? If so, I will send you the details to do this.

    Hope that helps!
    Mike

  14. 14 Posted by matt.zimmerman on 10 May, 2017 05:18 PM

    matt.zimmerman's Avatar

    I realize I'm late to the party, but we've also noticed increased memory consumption since we upgraded to 3.12.0 (previous version was a 3.10.X release).

    We use Datadog for host monitoring, and since we upgraded at ~6pm on 5 April, we've seen a slow, steady ramp of increasing memory usage (the attached graph is our Datadog metrics from our Octopus server over the last ~month starting at the time of the upgrade, with the y-axis reporting the percentage of free memory remaining on the system).

    It seems like we're seeing a similar issue, so I'd love to help try to diagnose the problem if I can, any first things to check or pieces of info you would like?

  15. Support Staff 15 Posted by Michael Noonan on 11 May, 2017 01:05 AM

    Michael Noonan's Avatar

    Hi Matt,

    Thanks for getting in touch! Would you mind following a similar process to Colin and taking a memory trace using dotTrace like this: https://octopus.com/docs/how-to/record-a-memory-trace#start-octopus...

    You can upload the exported workspace to this secure share.

    Hope that helps!
    Mike

  16. 16 Posted by matt.zimmerman on 11 May, 2017 05:46 PM

    matt.zimmerman's Avatar

    Will do, we're also upgrading to 3.13.X at some point soon, so I should be able to confirm pretty quickly if that had any effect on our particular setup or not; will update this thread with anything we learn.

  17. Support Staff 17 Posted by Michael Noonan on 12 May, 2017 03:54 AM

    Michael Noonan's Avatar

    Hi Matt,

    Thanks for keeping in touch! I look forward to hearing from you soon. :)

    Mike

  18. 18 Posted by colin on 12 May, 2017 03:46 PM

    colin's Avatar

    Hi Michael,

    To answer your question about third party tools, it's an Azure VM so
    whatever is pre-installed on that is all thats running. We have not
    installed anything other than Octopus and SQL Server.

    I guess we could share a database if you wanted but it does contain some
    sensitive information i.e. connection strings and storage account keys.

    Cheers
    Colin

    Colin Winning

    Technical Director

  19. Support Staff 19 Posted by Michael Noonan on 15 May, 2017 01:26 AM

    Michael Noonan's Avatar

    Hi Colin,

    Thanks for keeping in touch! At this point I'd say it's up to you - and I'm not even sure a database backup will help us find the root cause. Our policy is to keep the database private, only use the database for the purpose we intended, and then delete it as soon as we are done.

    In the meantime we are going to continue working on overall performance and memory allocation.

    Let me know what you would like to do.

    Hope that helps!
    Mike

  20. 20 Posted by matt.zimmerman on 17 May, 2017 09:04 PM

    matt.zimmerman's Avatar

    We just rolled out 3.13.5, I will update if we see additional issues with memory usage once we have some monitoring data with the new box.

  21. Support Staff 21 Posted by Michael Noonan on 18 May, 2017 02:25 AM

    Michael Noonan's Avatar

    Hi Matt,

    Thanks for keeping in touch. Looking forward to hearing if you see the behaviour continue, or not.

    We do have some work we are going to do soon to eliminate another source of over-allocation, which will help a lot with general memory pressure, but we still have no firm leads on a real "memory leak".

    Hope that helps!
    Mike

  22. 22 Posted by Loyal on 20 Jul, 2017 03:03 PM

    Loyal's Avatar

    I was just experiencing a very, very similar issue for the past few days.

    It turns out there were Windows Updates waiting to be installed, and after I let those go forward with a reboot and everything we had normal memory usage back.

    Very frustrating.

  23. Support Staff 23 Posted by Michael Noonan on 21 Jul, 2017 03:41 AM

    Michael Noonan's Avatar

    Hi!

    Thanks for getting in touch. That's interesting. So it didn't turn out to be a problem with Octopus from what you can tell, but a problem caused by Windows Updates being queued?

    I'd be interested to see if you experience a similar issue and can pinpoint it to Octopus.

    Hope that helps!
    Mike

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac