-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Experimental per-file cache locking #23854
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Curious to see what you think of this. Experiments on my workstation and cloudtop show that it significantly speeds up running whole test suites (or even small numbers of tests, if multiple library compiles are required) when the cache isn't hot, on machines with lots of cores. I'd be interested in hearing whether this reflects how you all run tests locally. It doesn't affect CircleCI times (since they run with frozen caches). One possible issue as currently written is that it doesn't respect EMCC_CORES, as it can spawn multiple system lib compiles, each of which can be highly parallel. This is desirable on local workstations with an excess of cores, but maybe not on e.g. Chromium CI where cores are more limited. |
Awesome! If it goes faster that sounds great. |
@@ -83,9 +86,9 @@ def ensure(): | |||
|
|||
def erase(): | |||
ensure_setup() | |||
with lock('erase'): | |||
with lock('erase', global_cachelock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default so not needed?
@@ -185,16 +188,27 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False) | |||
return str(cachename) | |||
|
|||
|
|||
def setup_file(cache_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put this up top before its first use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call this setup_lock
or something that suggests its function?
@@ -185,16 +188,27 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False) | |||
return str(cachename) | |||
|
|||
|
|||
def setup_file(cache_file): | |||
global cachedir, cache_file_locks, acquired_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these looks like they get assigned in this function so I think you can delete this line.
if cache_file not in cache_file_locks: | ||
file_path = Path(cache_file) | ||
assert not file_path.is_absolute() | ||
key_name = '_'.join(file_path.parts) + '.lock' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just have the lock live alongside the file? e.g. key_name = cache_file + '.lock'
|
||
|
||
def release_cache_lock(): | ||
def release_cache_lock(cachefile): | ||
global acquired_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is no longer needed I think
def setup(): | ||
global cachedir, cachelock, cachelock_name | ||
global cachedir, global_cachelock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_cachelock no needed here I think
To be clear, this isn't ready yet. In particular the issue I mentioned above (avoiding oversaturating the cores), and there's also an issue that there were a few cases where the "global" lock is acquired, e.g. the sanity check, that are now unordered wrt library building, which causes problems. Mostly I'm wondering if you all think this would save you time and if it would be worth it. |
This looks very interesting but I'm not sure if we can fix the oversaturating issue..? It seems like we'd need to live with it and hope for the best. Hard to say how big a risk that is. How big is the speedup here? |
Yeah, that seems like the trickiest issue. It seems most likely to affect CI machines or developer laptops, systems with small numbers of cores. I can think of a couple of possible ideas to at least mitigate the risk a bit. On my couldtop, I can run |
I don't normally find library builds to be huge bottleneck on my machine. Are you finding yourself in a situations where you need to clear the whole cache a lot? I guess there are some situations where that really is necessary (e.g. when bisecting an llvm code gen issue), but normally I would expect to cache to be warm, right? If, for example, you are finding yourself running |
Yeah I think maybe the difference there is I'm less often working on emscripten itself (where you are probably just changing one library, or making a change such that most or all of the cached archives don't need rebuilding), and more often working on LLVM, where any change might affect any or all of the codegen. The primary debug/testing cycle is more local but once you have something that looks viable, you just want to test it on a whole test suite at once. |
That's a nice speedup... given it's so large I feel it's worth the risk here. If we land this we can document it and suggest e.g. EMCC_CORES as a way to limit the number of cores, if people run into issues with too many processes. |
I'm not familiar with the code part, but functionality-wise, it sounds nice. My workflow is also somewhat similar to Derek's; I often work on LLVM that can affect the library compilation itself so I clear cache very often (and I confess I do Not sure how many people use |
The real problem with this as written is that it actually sort of breaks EMCC_CORES (even more than the limitations it already has). With a global lock, the one holder of the lock can build a library using N cores (where N is EMCC_CORES), and if no other process needs the lock, you could have N more concurrent processes (the test runner uses EMCC_CORES or the build system is set up however the user wants it, which probably doesn't take library building into account). So you'd likely be limited to at most N2 processes. With per-file locks you could in theory have each of the N tests or user builds each building a library, for NN cores. In practice this is of course unlikely (especially for a user build, which likely only does one link at a time and probably doesn't need multiple variants of the larger libraries like libc). But it's at least a possibility. Given that limitation, maybe I'm over-thinking though? |
What if we instead pushed a but harder on embuilder as the way to build more than one library at a time? And try to move away from auto-building libraries (at least for emscripten developers). embuilder can then assume it has a global lock and build the world in parallel. |
Embuilder actually already does a pretty good job of building the world in parallel, if EMCC_USE_NINJA is set, because it accumulates all the libraries into one ninja build. Its main limitation right now is that ports don't work that way yet (I didn't include them when I initially set up the Ninja support for system libs because they are built differently; I don't remember whether there was something inherently more difficult about the ports system or whether I just stopped because it was good enough). |
Rather than locking the entire cache, use a lock for each cached file.
The lock is held while e.g. building the library, and only blocks other
processes that want the same library. (Other processes are not blocked
when they try to build separate libraries or variants).