Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

OnActivated leaks memory #1252

Closed
RogerKratz opened this issue Feb 16, 2021 · 22 comments · Fixed by #1257
Closed

OnActivated leaks memory #1252

RogerKratz opened this issue Feb 16, 2021 · 22 comments · Fixed by #1257
Labels
pending-input Pending input or update

Comments

@RogerKratz
Copy link
Contributor

RogerKratz commented Feb 16, 2021

It seems OnActivated (and possibly other callbacks?) leaks memory. Below a little test that uses more and more memory. If OnActivated is removed, problem is gone

public class FooTest
{
  private int i;
  
  [Test]
  [Repeat(1000)]
  public void Bar()
  {
	  var builder = new ContainerBuilder();
	  Action<ContainerBuilder> registrationAction;
	  builder
		  .RegisterType<FakeService>()
		  .OnActivated(c =>
		  {
			  registrationAction = b =>
			  {
				  b.RegisterInstance(c.Instance);
			  };
		  });
	  var container = builder.Build();
	  container.Resolve<FakeService>();
	  container.Dispose();
	  
	  
	  i++;
	  if (i % 100 == 0)
		  Console.WriteLine((GC.GetTotalMemory(true) / 1024 / 1024) + "mb");
  }
  
  public class FakeService
  {
	  private IEnumerable<int> _data;
	  public FakeService()
	  {
		  _data = Enumerable.Range(1, 100).ToArray();
	  }
  }
}
@tillig
Copy link
Member

tillig commented Feb 16, 2021

It'd probably be good to throw this in a memory profiler to see what's actually holding references to what. For example, I'm unclear what the registrationAction assignment is doing there. I'd also say you should do a direct GC.Collect() rather than having GetTotalMemory do it - GetTotalMemory times out rather than literally waiting for everything to be collected so it may not be entirely accurate.

So:

  1. Get a profiler on that thing to find out for sure what's holding references.
  2. Try not assigning something inside the closure. (Lose registrationAction.)
  3. Consider using something like BenchmarkDotNet instead of manually accessing the memory information. BenchmarkDotNet will do all sorts of things like warming up, handling the garbage collection, etc., and will give you something possibly a bit more accurate. Here's an example benchmark showing how to use MemoryDiagnoser to get the stats.

On a side note... @alistairjevans this seems familiar, like we saw a similar issue where, again, some sort of assignment or work done inside the event handler was causing issues. Am I imagining that? I searched the issues list and couldn't find it.

@RogerKratz
Copy link
Contributor Author

Some quick replies...

  1. I tried to but it's not that obvious. Will have yet another look.
  2. Then it works just fine. However, what I didn't mentioned above, we noticed this when we tried to upgrade autofac from a very old version (3.x). In that version, this problem didn't exist,
  3. Ok, I can fix something like that.

@tillig
Copy link
Member

tillig commented Feb 16, 2021

we noticed this when we tried to upgrade autofac from a very old version (3.x)

A lot has changed over the years, especially in the internals. You aren't guaranteed the exact same behavior over three major version releases. However, if you can show us what exactly is leaking the memory, or at least what's actually holding it so we can be sure it's Autofac and not application code, then it's something we can investigate. Unfortunately, given the very small number of unpaid maintainers here, where support isn't our actual 'day job,' we need all the help we can get in pinpointing what's going on. The more help you can provide, the faster your problem will be solved.

@alistairjevans
Copy link
Member

The reason that there may appear to be a memory leak here (and why there isn't one when you remove the assignment from the callback) is that when we execute the OnActivated callback you are creating a closure that captures the local context and storing it outside the callback. As of v6 that callback's closure will capture

  • the calling action
  • the pipeline middleware added to the component's pipeline
  • (I believe) a load of other information about the current lifetime scope.

That said, when the container instance and the capturing registrationAction goes out of scope the GC should clean everything else up just fine. I suspect an intervening GC.Collect will show this.

@tillig
Copy link
Member

tillig commented Feb 16, 2021

That's kinda what I thought, too, that the GC.Collect or BenchmarkDotNet would maybe yield more accurate results and show that maybe there's not a leak as such. See, I'm not crazy! 😜 Good point about the extra stuff that's being captured, that's probably why the difference appears so much larger than the older version of Autofac.

Oh, I'd also recommend doing all this when building/testing in Release mode. There's a huge difference between building in Debug and building in Release, especially around garbage collection. In Debug mode the app never knows when you're going to break the debugger on a method and want to check the locals, so it doesn't do as aggressive a memory cleanup when garbage collecting - locals won't actually be cleaned up by the time you call that GC.GetTotalMemory call, for example, because they're still in scope. Release mode will see they're not used and allow them to be cleaned up. If you use BenchmarkDotNet you'll actually get a warning if you try to run the benchmarks in debug mode.

@RogerKratz
Copy link
Contributor Author

GC.Collect(), running in release mode etc doesn't seem to help.

We will dig deeper here to see what more we can find out.

RogerKratz added a commit to RogerKratz/Autofac that referenced this issue Feb 17, 2021
@tillig
Copy link
Member

tillig commented Feb 17, 2021

I am not really able to recreate the memory leak. Here are the steps I've taken:

First, I checked out your PR branch from #1253. I noticed you hadn't attached [MemoryDiagnoser] to the class, and while there is some memory diagnosis info being output, per the BenchmarkDotNet docs you need to use the attribute to ensure the benchmark is more about memory usage than about perf. I added that attribute to the benchmark class when running my tests.

I started out by running the test as-is, where Action<T> is allocated outside the OnActivated event.

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
ResolveWithOnActivatedWithAction 99.91 us 5.041 us 14.86 us 106.2 us 3.7842 1.3428 - 23.2 KB

What that shows is that, per 1000 operations, there were 3.7842 Gen 0 garbage collections, 1.3428 Gen 1 garbage collections, and 23.2KB allocated per op. OK, cool.

Next I deleted the Action<T> declaration outside the OnActivated and moved it to an inline declaration. This creates the lambda, but it avoids capturing the context in the closure, just so we can see the difference in memory allocation:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ResolveWithOnActivatedWithAction 24.90 us 0.495 us 0.530 us 3.7537 1.8616 0.0305 23.11 KB

Not a significant change. I'm guessing that the additional time spent is in capturing the context and the inability of the compiler to optimize some things out.

Finally, I deleted the whole OnActivated part just to see what a simple register/resolve looks like, for comparison to the others:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ResolveWithOnActivatedWithAction 24.10 us 0.456 us 0.468 us 3.6621 1.8311 - 22.54 KB

While capturing the action outside the event seems way slower, I don't see anything significantly different across those runs.

Time to do some memory leak debugging and try to recreate it a different way to see if I'm missing something. In this case, I created a quick console app that runs the code from your benchmark in a loop. It does no garbage collection, just lets the memory mount up or whatever. I used .NET 5 because I'm on a Mac and dotnet-dump only works with .NET 5.0+ on macOS.

It looks like this:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Autofac" Version="6.1.0" />
  </ItemGroup>
</Project>

and

using System;
using System.Collections.Generic;
using Autofac;

namespace AutofacDemo
{
    public static class Program
    {
        public static void Main()
        {
            Console.WriteLine("Running the test - use Ctrl+C to exit.");
            var count = 0;
            while(true)
            {
                ResolveWithOnActivatedWithAction();
                count++;
                if(count % 10000 == 0)
                {
                    Console.WriteLine(count);
                }
            }
        }

        public static void ResolveWithOnActivatedWithAction()
        {
            var builder = new ContainerBuilder();
            Action<ContainerBuilder> someAction;
            builder
                .RegisterType<FakeService>()
                .OnActivated(c =>
                {
                    someAction = b =>
                    {
                        b.RegisterInstance(c.Instance);
                    };
                });
            using var container = builder.Build();
            container.Resolve<FakeService>();
        }
    }

    public class FakeService
    {
        private IEnumerable<int> _data;
        public FakeService()
        {
            _data = new int[1000];
        }
    }
}

Basically it just runs your benchmark thing over and over, writing to console every 1000 iterations so I can see that it's still going.

I started that in one console window.

Following that documentation I linked I then did a dotnet trace ps to find the process, then dotnet counters monitor --refresh-interval 1 -p XXXXX (where the XXXXX is the process ID of that test program).

The output, which refreshed once a second, looked like this:

[System.Runtime]
    % Time in GC since last GC (%)                                 0
    Allocation Rate (B / 1 sec)                               40,840
    CPU Usage (%)                                                  0
    Exception Count (Count / 1 sec)                                0
    GC Fragmentation (%)                                          43.868
    GC Heap Size (MB)                                              8
    Gen 0 GC Count (Count / 1 sec)                                 0
    Gen 0 Size (B)                                         6,109,792
    Gen 1 GC Count (Count / 1 sec)                                 0
    Gen 1 Size (B)                                           531,800
    Gen 2 GC Count (Count / 1 sec)                                 0
    Gen 2 Size (B)                                         7,127,464
    IL Bytes Jitted (B)                                      246,906
    LOH Size (B)                                             385,376
    Monitor Lock Contention Count (Count / 1 sec)                  0
    Number of Active Timers                                        2
    Number of Assemblies Loaded                                   93
    Number of Methods Jitted                                   2,858
    POH (Pinned Object Heap) Size (B)                             24
    ThreadPool Completed Work Item Count (Count / 1 sec)           0
    ThreadPool Queue Length                                        0
    ThreadPool Thread Count                                        3
    Working Set (MB)                                              47

The important bit there is that GC Heap Size. I watched as it started around 8MB, went up to around 14MB, then garbage got collected and it returned to 8. Over and over.

I ran the test for about 3 million operations and it didn't grow, it didn't get out of control or anything.

Basically, after all that, I still didn't see any leak. Sorry. This article on troubleshooting .NET memory leaks may help you.

I did note that the example seems to be this desire to capture an instance of something resolved and then... have a lambda that will register that instance in a different ContainerBuilder. I'm not sure if that's just an example or if that's something you're actually doing. It seems like if that's something you're actually doing, that... well, it doesn't seem like something I'd do. If I needed to grab an instance and put it in a different container, I'd just do the resolve directly. If I was grabbing the instance to register it in some child lifetime scope in the same container... that's a problem with registering correct lifetime scopes and shouldn't be solved that way.

In any case, I'm not sure there's anything we can do. I'm not going to be able to really spend much more time trying to troubleshoot this since I can't replicate it. It seems like what's probably happening is that you're capturing stuff in the OnActivated event and doing something with it, and that's what's causing the problem.

I'll leave this issue open and marked pending input for a bit. If you can get a profiler on this thing and conclusively show there's something we're holding onto, we'll definitely look at fixing it. However, my gut is saying the issue is with app code - capturing something in that lambda and not properly letting it go - and that's not something we can fix for you.

@tillig tillig added the pending-input Pending input or update label Feb 17, 2021
@RogerKratz
Copy link
Contributor Author

First, thanks for your reply and effort! Much appreciated!

In the PR I sent in, I didn't even got a result, the memory pressure was so extreme, so adding [MemoryDiagnoser] wouldnt do any difference,

When I run your example (the console app), I have no chance waiting for "3 million operations"... Using same tool as you do, this is the result in iteration ~1.65million...

image

...not even close to your result.
Really strange. FYI. a colleague of mine see the same result as I do.

@tillig
Copy link
Member

tillig commented Feb 17, 2021

Not sure what to say here. Machine specific stuff can affect things, OS can affect it, amount of memory you have can affect it... like, for example, let's say you have NCover installed and have been using it. It runs on the profiler API so it can instrument things and get coverage results for you. If that's turned on for all .NET processes, it can mess with the perf and memory. (I'm not saying you are using NCover, I'm saying there are things you can install or use that may affect things.)

I would very, very strongly recommend you get a .NET memory profiler on this thing to see what's going on. It seems like something is obviously happening, but since I can't repro it, it means you're unfortunately on your own. The only way you're going to really track it down is with the right tools.

In the meantime, something you may want to consider if you can - set up a super basic Windows Hyper-V VM. No extra tooling, just .NET Core SDK and the dotnet counters and other tools to do the basic analysis. See if you can repro the issue inside that VM. That could at least tell you if it's something you have installed/running on your dev machine that may be affecting things.

@RogerKratz
Copy link
Contributor Author

I'm actually on a newly installed machine so I was thinking the opposite. Maybe on a "clean" machine the problem occurs, you might have a kb, some update or similar fixing the issue? Pretty far-fetched I know. On the other hand, my colleague (and our build agents) have the same problem as I do.

We have solved the problem in our app now, avoiding the lambda in the callback. Still, there's something fishy here I guess. Not at all sure it's related to autofac code, not saying that at all.

Gave a mem profiler a shot a few days ago but couldn't (easily) find out what ref causing the objects to still be held. Could see lots of objects kept in memory because of (indirect) refs from _registrations in DefaultRegisteredServicesTracker. But I couldn't see any valid reason DefaultRegisteredServicesTracker itself was kept in memory. In the mem profiler I could see one root referring _registrations but AFAIK it is "ok" because it's a ConcurrentBag which internally has some static data. AFAIK it should have be released properly. I can try to have another look in a day or two and see if I can find anything else.

@RogerKratz
Copy link
Contributor Author

I run "your" console app in dotmemory. See the same thing as when I had a look a few days ago, cannot find anything new, seems like the ConcurrentBag never releases its references.

image

Why the ConcurrentBag's internal TLS lists are cleaned up while thread still is alive in your env and not in mine beats me.

Here's a SO thread about the subject.
https://stackoverflow.com/questions/10850443/is-there-a-memory-leak-in-the-concurrentbagt-implementation

@alsami
Copy link
Member

alsami commented Feb 18, 2021

I am referencing an owin issue which stated similar concerns to get better search results on that issue.

@tillig
Copy link
Member

tillig commented Feb 18, 2021

Oh, good memory, @alsami! I knew there was something tickling the back of my brain, that was it.

I set up a Windows VM and tried this same test. It seems my Windows machine has a much larger tolerance for pain uncollected garbage, where the GC heap size got into the GB range.

[System.Runtime]
    % Time in GC since last GC (%)                                  43
    Allocation Rate (B / 1 sec)                             22,526,696
    CPU Usage (%)                                                   23
    Exception Count (Count / 1 sec)                                  0
    GC Fragmentation (%)                                             0.488
    GC Heap Size (MB)                                            7,225
    Gen 0 GC Count (Count / 1 sec)                                   4
    Gen 0 Size (B)                                                  24
    Gen 1 GC Count (Count / 1 sec)                                   2
    Gen 1 Size (B)                                          14,228,752
    Gen 2 GC Count (Count / 1 sec)                                   0
    Gen 2 Size (B)                                       7,234,332,952
    IL Bytes Jitted (B)                                         76,361
    LOH Size (B)                                             8,293,888
    Monitor Lock Contention Count (Count / 1 sec)                    0
    Number of Active Timers                                          0
    Number of Assemblies Loaded                                     19
    Number of Methods Jitted                                     1,142
    POH (Pinned Object Heap) Size (B)                               24
    ThreadPool Completed Work Item Count (Count / 1 sec)             0
    ThreadPool Queue Length                                          0
    ThreadPool Thread Count                                          0
    Working Set (MB)                                             6,525

It climbed until about 12GB of heap before it did a garbage collection. that was around the 660,000 iteration range. But it never did get back down to the original size.

However, it absolutely slowed to a crawl, which the Mac version didn't. So, so weird. I got to 3 million in a few minutes on Mac; in Windows it took like two minutes to go from 740,000 to 750,000. It probably ties in a bit with this blog article where thread local storage usage caused slowdown in .NET 4. I'm guessing the extra thread local storage being used here is also probably causing challenges.

I wonder if the answer is as simple as "be sure to set the ConcurrentBag<T> to null on dispose?" Let me see if I can set up a repro outside of Autofac code like the one in that StackOverflow question to see if it makes a difference. I noticed some folks switched away from the concurrent collections to alleviate the issues; it may be we need to do that here, too, for the same reasons.

Thanks all for keeping on this; it's really hard to nail down these things so I appreciate the help thus far.

@RogerKratz
Copy link
Contributor Author

I would be happy if I'm wrong but I doubt setting ConcurrentBag to null would help. Dont think the problem is that autofac has any ref still hanging on to that object but the impl of ConcurrentBag itself and its usage of ThreadLocal.
Right now I don't understand how ConcurrentBag could even work properly at all in a long living thread, nor do I understand why the problem doesn't occur in your Mac environment (slightly different .net runtime maybe with different impl of ConcurrentBag and/or ThreadLocal)?

@tillig
Copy link
Member

tillig commented Feb 18, 2021

OK, I've spent a couple of hours trying to be pretty belligerent with ConcurrentBag<T> outside of an Autofac context but I've not yet been able to replicate the issue. The added complexity of what context is being captured from the closure is the difference between my simple repros and what's really happening and I'm not sure how to isolate that.

I would assume the next thing to try would be to use something other than ConcurrentBag<T> and see if it still has the same issues we see in the original repro.

I will say... I just (20 minutes or so ago?) got word that the "day job" project I'm on has been, uh, "accelerated" to have a required delivery date several months ahead of schedule. That probably limits the amount of time I might be able to spend on this in the immediate future, though I do think it needs to be solved. If other folks can pick up where I left off, great; if not, I'll have to get back to it after the heat is off.

@RogerKratz
Copy link
Contributor Author

Ok, thanks for your replies, time and help.

Anyhow, fwiw, to reproduce a simple ConcurrentBag issue, this is enough for me, "leaks" 1152kb. Changing using eg a List<T> instead, "0 kb" is the output.

[Test]
public void Foo()
{
    var before = GC.GetTotalMemory(true);
    for (int i = 0; i < 100000; i++)
    {
        var bag = new ConcurrentBag<long>();
        bag.Add(1);
        bag.Add(1);
        bag.Add(1);
        bag.Add(1);
    }
    GC.Collect(2);
    var after = GC.GetTotalMemory(true);
    Console.WriteLine((after-before)/1024 + "kb");
}

@alistairjevans
Copy link
Member

I can take a look at this when I get a bit of time (day job also pretty busy over here).

I'm fairly familiar with the synchronisation requirements of the DefaultRegisteredServicesTracker; the behaviour changed some in v6, so it may be we don't actually need the ConcurrentBag any more, we'll see.

@alistairjevans
Copy link
Member

OK, I've figured out why this memory leak happens, and in turn, what the solution is. Worth noting it was trivial for me to recreate the huge memory leak on Windows with @tillig's test program.

Basically, the problem is that when the DefaultRegisteredServicesTracker is disposed (which happens when the owning Container is disposed), we do not explicitly empty the ConcurrentBag that holds the set of all registrations.

For any other collection type, this wouldn't be a problem; normally the GC would collect the collection, which would in turn collect every item in it. However, due to the use of ThreadLocal storage in the ConcurrentBag, while a single item remains in the bag, at least one thread will retain a reference to it, which in turn prevents the whole bag being garbage collected.

So, the solution then is just to explicitly clear that ConcurrentBag when disposing of DefaultRegisteredServicesTracker. As soon as I do that, the memory leak goes away.

Will raise a PR shortly.

Also, thanks @RogerKratz for persisting with this despite our initial opposition!

@RogerKratz
Copy link
Contributor Author

Sounds great but... In isolated tests outside autofac, emptying the collection won't make any difference, I tried that before and I double checked it now again. Eg snippet above, #1252 (comment), behaves the same in my environment no matter clearing the collection or not.
If you run that snippet in your env with and without clearing the collection, do you see different result? If you do, unfortunately maybe your autofac fix is env depended? If you don't but you notice improvement when you tested autofac, it works in "autofac context" and everything is fine hopefully.

@alistairjevans
Copy link
Member

I can confirm that if I use the simple loop from your comment, clearing the bag after each iteration has no effect on memory usage in my environment (i.e. there still appears to be a leak, as you have seen).

But emptying definitely has a big impact in Autofac, so I think I can rule out the environmental.

@tillig
Copy link
Member

tillig commented Feb 24, 2021

Super weird this only happens on Windows. They must have implemented it differently in the Mac stack, or the Mac garbage collector works differently enough to clean up the thread local storage once ConcurrentBag references are gone.

@alistairjevans
Copy link
Member

There's an interesting comment in the runtime about the finalizer getting called when the thread exits.

I suspect, as you say, that the GC for thread local storage works differently on the two platforms.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
pending-input Pending input or update
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants