Skip to content

After conversion to LLVM we should be able to delete the inferred source of the kernel. #520

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link
Member

@simonbyrne has shown me a heap-snapshot were the inferred source took up >>1GB of ram.

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Patch coverage: 88.88% and project coverage change: -7.74% ⚠️

Comparison is base (edfdc1a) 83.18% compared to head (919242d) 75.44%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #520      +/-   ##
==========================================
- Coverage   83.18%   75.44%   -7.74%     
==========================================
  Files          24       24              
  Lines        3300     3270      -30     
==========================================
- Hits         2745     2467     -278     
- Misses        555      803     +248     
Files Changed Coverage Δ
src/jlgen.jl 77.85% <85.71%> (-2.07%) ⬇️
src/execution.jl 67.79% <100.00%> (-32.21%) ⬇️

... and 13 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@simonbyrne
Copy link
Contributor

This doesn't seem to fix my issue. I'm not sure exactly where the problem is, but I did notice:

julia> GPUCompiler.GLOBAL_CI_CACHES
Dict{CompilerConfig, GPUCompiler.CodeCache} with 2 entries:
  CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…
  CompilerConfig for PTXCompilerTarget => CodeCache(IdDict{MethodInstance, Vector{CodeInstance}}(MethodInstance for >>(…

julia> Base.summarysize(GPUCompiler.GLOBAL_CI_CACHES) / 10^6
1396.946174

julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[1]) / 10^6
1393.855007

julia> Base.summarysize(collect(values(GPUCompiler.GLOBAL_CI_CACHES))[2]) / 10^6
3.090233

I tried manually calling empty! on this dict: it didn't seem to make any difference, so I suspect the data is being retaine somewhere else as well.

@simonbyrne
Copy link
Contributor

Also, what's odd is that RES reported by top is 6.3g, but

julia> Sys.maxrss() / 10^9
17.232601088

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

Removed a call to jl_uncompress_ir, as IIRC it was only needed for the 1.6 overlay hack: #151 (comment)
Maybe that also helps?

@simonbyrne
Copy link
Contributor

Unfortunately still no.

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

You could try taking a heap snapshot.

@simonbyrne
Copy link
Contributor

I did that: it looks like most of it is still the inferred objects:
Screenshot 2023-09-21 at 11 11 59 AM

I tried clearing them out manually:

for cache in values(GPUCompiler.GLOBAL_CI_CACHES)
    for insts in values(cache.dict)
        for inst in insts
            @atomic :release inst.inferred = nothing
        end
    end
end

that seemed to work:

Screenshot 2023-09-21 at 11 09 45 AM

top is still reporting 4GB of memory usage though, so not sure what is going on.

@vchuravy
Copy link
Member Author

So I am only deleting top-level kernel calls. Since everything else is re-usable.

@vchuravy
Copy link
Member Author

@maleadt are we tracking anywhere how big the modules are we load onto the GPU?

@maleadt
Copy link
Member

maleadt commented Sep 21, 2023

@maleadt are we tracking anywhere how big the modules are we load onto the GPU?

No, and I don't know of a way to query the size of a CuModule or CuContext.

@maleadt maleadt force-pushed the master branch 5 times, most recently from 1d233d7 to e18b7c2 Compare January 20, 2025 10:33
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants