-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[nix-local-build] Garbage collecting the store #3333
Comments
We should pick up the GC work that was done by the GSoC student (Vishal Agrawal) last summer. The approach to tracking roots that they came up with was essentially to register them centrally. I don't recall the exact details, but essentially a central dir with symlinks to the locations of local build trees (to ghc environment files specifying the required libs). Then on GC we scan those roots, and ignore (delete) any stale dangling symlink pointers. |
Just wondering what the current status of this is - I've been using new-build for a few weeks and now have about 5GB worth of store already. I guess I'll have to resort to just wiping it out and rebuilding what I need until there's a "proper" GC command. |
I'm not aware of anyone who is working on it, so yes, wipe for now. |
Me and @hvr had a chat about this some days ago. Here's what we came up with: This is the representation of the pinned packages: type PinnedPackages = Map UnitId PinState
data PinState =
UsedBy [PinUse] -- ^ Some use of the package prevents it from being gced. The list may be a Set instead
| Explicit -- ^ the user explicitly ran something like `cabal new-gc pin pkgid`
instance Monoid PinState where
-- the list is concatenated, and 'Explicit' is the absorbing element
data PinUse =
Project FilePath -- ^ A project/package somewhere in the filesystem requires this package.
-- The pinning is done when new-* is invoked in the project and cabal solves for a plan.
| Installed FilePath -- ^ the exe/lib was new-installed
Like is suggested above, a run of If Finally:
Please @hvr just edit this comment if I left something out |
@fgaz IMO the |
Yes, if the installation directory can be chosen (I just discovered there's an option to do it on old install) we need a path there too. Edited. |
@fgaz well, for executables you also need to take into account that an For libraries explicitly installed via "new-install" it's not so clear to me what to use as the retainer-entity (unless we install into a "package environment", but that's only supported w/ GHC 8.0.2 and later) |
I have started writing a Considering about better ways of determining roots now. First question: Why not data PinUse
= Project FilePath
| Installed FilePath
| Explicit ? But that is only a rather superficial change anyways i guess. Also, how do we determine from More importantly: What about the case that the same project is tested with multiple ghc versions? At the moment, when switching compiler, the old data RootSources = Map FilePath (Map CompilerVersion [UnitId]) or something in a similar direction. Though this requires that projects actively update this central repository not only once to register, but almost whenever they create a new plan.
|
Yes, you make all good points. We already have a concept of having different directories for different configurations, e.g., with and without optimization. It might be good to cache plans separately for each configuration as well. See also #3343 I'd be happy to take any patch that makes your life easier on this front. |
See https://github.com/lspitzner/pkgdbgc Implements
Not (yet) implemented
Sorry, but I won't be making PRs against cabal. That codebase intimidates me too much. |
I have resolved most of the issues, although an important bit remains: pkgdbgc still does not track profiling, optimization level or other flags, so there is risk of e.g. garbage-collecting the profiling-enabled dependencies because your last compile was with profiling disabled. I am not entirely convinced that having multiple plan.jsons is a good idea. It seems somewhat likely that the user ends up accumulating several plans, for various combinations of flags, which in turn effectively requires garbage-collecting outdated plans too. A more lazy approach is to support specifying the build directory, and to pass the responsibility to the user. But it is indeed rather lazy. |
How about instead of trying to determine which dependencies are being used, we just determine which are "oldest". Anytime a package is put into the store, you record the date and time. Whenever it is accessed, you also update the date and time. Once the store gets to big, you can just get rid of the X GB of oldest packages, or something. |
Another tool that does garbage collection is @phadej's cabal-store-gc |
This is a bit of an interesting problem. On the one hand, it's intractable to determine the GC roots, because
dist-newstyle
"roots" may be scattered at arbitrary locations in the file system. On the other hand, so long as a library is not depended upon an executable which is being "used" (i.e., part of a profile, see #3332) then it is recoverable if we accidentally delete it: the next time someone runsnew-build
on the project, it will just get rebuilt. But this is not exactly a safe assumption; for example, I have a symlink to a binary in anew-dist
directory (since I'm dogfooding Cabal); if I accidentally GC away a dynamic library it depends on I'll have to go and rebuild it.The text was updated successfully, but these errors were encountered: