Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Memory leak #1625

Closed
epellizzer opened this issue Sep 18, 2019 · 19 comments · Fixed by #1630
Closed

Memory leak #1625

epellizzer opened this issue Sep 18, 2019 · 19 comments · Fixed by #1630

Comments

@epellizzer
Copy link

epellizzer commented Sep 18, 2019

Hi.

I have written a Pipe2 that computes a diff between two sorted streams, and it seems to cause a memory leak that in some cases leads to the heap blowing up.

To ease everyone's life, I have created this GitHub project that contains the commented code as well as the unit tests that show the problem.

It was the first time I wrote a Pipe2, so I hope my implementation is not too naive, but even so, FS2 should probably not react that way.

@epellizzer
Copy link
Author

I have ran the JVM with JFR recording enabled and pushed the resulting recording in the GitHub project I mentioned above.

@epellizzer
Copy link
Author

Ok, I managed to reproduce the issue with this dead simple code.

object StreamPlayground extends App {
  def range[F[_]](start: Long, stopExclusive: Long, by: Long = 1L): Stream[F, Long] =
    Stream.unfold(start) { i =>
      if ((by > 0 && i < stopExclusive && start < stopExclusive) ||
        (by < 0 && i > stopExclusive && start > stopExclusive))
        Some((i, i + by))
      else None
    }

  println(range[Pure](10000004704L, 10002805638L)
    .fold(0L)(_ + _)
    .compile
    .last)
}

I had to copy the implementation of range for long integers. The two values I use are the lower and upper bounds of my data sample.

This code needs no less than 1.4 GB of heap to run, and profiling shows crazy high GC pressure, as well as the same allocation pattern as the one I obtain with my diff pipe implementation.

range is implemented with unfold, which itself uses suspend and ++. There seems to be a problem down there.

@djspiewak
Copy link
Member

Does your range reproducer outright run out of memory if you give it a large enough bound? Or does it just impose insane heap pressure?

@epellizzer
Copy link
Author

Well, as I said, 1.4 GB of heap is the minimum necessary to not run out of memory. Since I expect such a simple streaming program to use a hundred times less, I can say that, yes, it outright runs out of memory :).

Now if you're asking whether the crash depends on the upper bound, it appears so, since I multiplied the one I give as an example by two, and then 1.4 GB were no longer sufficient.

@epellizzer
Copy link
Author

For now I'll resort to using eagerly allocated ranges. In my case, it means pre-allocating 16 MB, but it's ok.

I'm still stuck because of #1624 though.

@djspiewak
Copy link
Member

I wonder if this (and #1624) are related to @mpilquist's recent scoping changes with ++? I didn't think any of the scopes were being retained, but memory is tricky.

@epellizzer
Copy link
Author

I don't know how recent are the changes you're mentioning, but know that the problems I raised in this issue and #1624 were first observed in version 1.0.5. I upgraded to 2.0.0 afterwards, hoping the problems were fixed.

@djspiewak
Copy link
Member

That definitely rules out the recent changes then. This is weird and concerning, because it implies that ++ isn't entirely linear.

@mpilquist
Copy link
Member

It's not ++ as this program is fine:

println(Stream.range(0, 2800934).compile.last)

As-is:

println(Stream.range(0, 2800934).map(_ + 1).compile.last)

These two fail though:

println(Stream.range(0, 2800934).fold(0L)(_ + _).compile.last)
println(Stream.range(0, 2800934).forall(_ >= 0).compile.last)

It appears that the recursive pulls are holding on to the intermediate values for some reason.

@epellizzer
Copy link
Author

Oh wait wait, I remembered wrong. I just retried the small code above with 1.0.5 and -Xmx32m and it works like a charm. I stumbled upon the leak when I upgraded to 2.0.0 because of #1624.

Sorry to have misled you.

@mpilquist
Copy link
Member

mpilquist commented Sep 20, 2019

Thanks for the update. The fact that this is new in 2.0.0 makes me suspect something in the FreeC or Algebra changes. I doubt it's scope related as there's no bracketing occurring here. Heap dump analysis doesn't show a large number of scopes. The heap dump does show a large number of Some, Integer, FreeC.Eval, IO.Pure, IO.Bind, Algebra.Output, etc.

Any ideas @diesalbla?

I'm starting a git bisect now...

@mpilquist
Copy link
Member

mpilquist commented Sep 20, 2019

I did a git bisect between 2.0.0 tag (bad) and 1.0.5 tag (good). My test procedure was starting SBT with -J-Xms512m -J-Xmx512m, running ;project coreJVM;console and then entering println(Stream.range(0, 2800934).fold(0L)(_ + _).compile.last) at the console. If it hung, I'd kill the SBT process and mark the commit bad. If it computed the sum, I'd mark the commit good.

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[36b1279cda415ebe8bb3faa33cb9aa3e6c6697d9] Algebra: replace the R class with a CompileEnd trait

Looks like leak was introduced by 36b1279

@diesalbla
Copy link
Contributor

@mpilquist Could you try again that check on the tip of this branch? #1630

@mpilquist
Copy link
Member

Yep will do.

@mpilquist
Copy link
Member

Confirmed fixed by #1630. I'll get a 2.0.1 release out this weekend.

@epellizzer
Copy link
Author

Great ! Thanks guys.

@mpilquist
Copy link
Member

@epellizzer I just released 2.0.1 -- please give it a try and let us know how it goes.

@epellizzer
Copy link
Author

Cool. I will try next week. Thanks !

@epellizzer
Copy link
Author

I confirm everything is OK with 2.0.1.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants