Experimental per-file cache locking #23854

dschuff · 2025-03-06T23:31:59Z

Rather than locking the entire cache, use a lock for each cached file.
The lock is held while e.g. building the library, and only blocks other
processes that want the same library. (Other processes are not blocked
when they try to build separate libraries or variants).

dschuff · 2025-03-06T23:37:49Z

Curious to see what you think of this. Experiments on my workstation and cloudtop show that it significantly speeds up running whole test suites (or even small numbers of tests, if multiple library compiles are required) when the cache isn't hot, on machines with lots of cores. I'd be interested in hearing whether this reflects how you all run tests locally.

It doesn't affect CircleCI times (since they run with frozen caches). One possible issue as currently written is that it doesn't respect EMCC_CORES, as it can spawn multiple system lib compiles, each of which can be highly parallel. This is desirable on local workstations with an excess of cores, but maybe not on e.g. Chromium CI where cores are more limited.

dschuff · 2025-03-06T23:38:24Z

@sbc100 @kripken @brendandahl @aheejin

sbc100 · 2025-03-07T00:03:37Z

Awesome! If it goes faster that sounds great.

sbc100 · 2025-03-07T00:29:16Z

tools/cache.py

@@ -83,9 +86,9 @@ def ensure():

 def erase():
  ensure_setup()
-  with lock('erase'):
+  with lock('erase', global_cachelock):


This is the default so not needed?

sbc100 · 2025-03-07T00:30:28Z

tools/cache.py

@@ -185,16 +188,27 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False)
  return str(cachename)


+def setup_file(cache_file):


Maybe put this up top before its first use?

Should we call this setup_lock or something that suggests its function?

sbc100 · 2025-03-07T00:31:22Z

tools/cache.py

@@ -185,16 +188,27 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False)
  return str(cachename)


+def setup_file(cache_file):
+  global cachedir, cache_file_locks, acquired_count


None of these looks like they get assigned in this function so I think you can delete this line.

sbc100 · 2025-03-07T00:33:08Z

tools/cache.py

+  if cache_file not in cache_file_locks:
+    file_path = Path(cache_file)
+    assert not file_path.is_absolute()
+    key_name = '_'.join(file_path.parts) + '.lock'


Should we just have the lock live alongside the file? e.g. key_name = cache_file + '.lock'

sbc100 · 2025-03-07T00:35:11Z

tools/cache.py



-def release_cache_lock():
+def release_cache_lock(cachefile):
  global acquired_count


This line is no longer needed I think

sbc100 · 2025-03-07T00:35:54Z

tools/cache.py

 def setup():
-  global cachedir, cachelock, cachelock_name
+  global cachedir, global_cachelock


global_cachelock no needed here I think

dschuff · 2025-03-07T18:32:14Z

To be clear, this isn't ready yet. In particular the issue I mentioned above (avoiding oversaturating the cores), and there's also an issue that there were a few cases where the "global" lock is acquired, e.g. the sanity check, that are now unordered wrt library building, which causes problems. Mostly I'm wondering if you all think this would save you time and if it would be worth it.

kripken · 2025-03-07T19:37:02Z

This looks very interesting but I'm not sure if we can fix the oversaturating issue..? It seems like we'd need to live with it and hope for the best. Hard to say how big a risk that is.

How big is the speedup here?

dschuff · 2025-03-07T20:05:18Z

Yeah, that seems like the trickiest issue. It seems most likely to affect CI machines or developer laptops, systems with small numbers of cores. I can think of a couple of possible ideas to at least mitigate the risk a bit.
The number of cores used would be the number of parallel compiles (which is in turn limited by either the test parallelism or the parallelism in the user's build), times the number of distinct libraries that are needed at once, times the number of files in the built library.
So one option to limit user damage would be to limit this optimization to only be used with the test runner. Then it's basically just us, our CI, and maybe a few other people, and this seems most useful when running tests, since we rebuild and test frequently, and it's mostly the test runner that does lots of concurrent links using different system libraries. If we limit it further to use a global cache when there are less than, say, 64 cores in the system, that would basically just be our beefy developer machines. Or we could make it opt-in completely via an env var or something.

On my couldtop, I can run core0 with an empty cache in 8:22 without this patch and 3:33 with it. So for that (probably best-case) scenario the benefits are pretty substantial. other tests run in 14:35 vs 8:57.

sbc100 · 2025-03-07T20:57:55Z

I don't normally find library builds to be huge bottleneck on my machine.

Are you finding yourself in a situations where you need to clear the whole cache a lot? I guess there are some situations where that really is necessary (e.g. when bisecting an llvm code gen issue), but normally I would expect to cache to be warm, right?

If, for example, you are finding yourself running ./emcc --clear-cache when iterating on just a single library then maybe we should make it more ergonomic to clear just one library. When working on wasmfs recently I've been doing a lot of ./embuilder clear libwasmfs-debug rather than clearing the whole cache. Would improving ./embuilder clear help your common use cases?

dschuff · 2025-03-07T21:05:40Z

Yeah I think maybe the difference there is I'm less often working on emscripten itself (where you are probably just changing one library, or making a change such that most or all of the cached archives don't need rebuilding), and more often working on LLVM, where any change might affect any or all of the codegen. The primary debug/testing cycle is more local but once you have something that looks viable, you just want to test it on a whole test suite at once.
But that's not really a huge bottleneck for me most of the time either; I really only accidentally happened to notice the behavior, have the idea for why it might be slower than needed, and fall down the rabbit hole. Hence starting this conversation :)

kripken · 2025-03-08T00:03:05Z

That's a nice speedup... given it's so large I feel it's worth the risk here.

If we land this we can document it and suggest e.g. EMCC_CORES as a way to limit the number of cores, if people run into issues with too many processes.

aheejin · 2025-03-08T01:47:50Z

I'm not familiar with the code part, but functionality-wise, it sounds nice.

My workflow is also somewhat similar to Derek's; I often work on LLVM that can affect the library compilation itself so I clear cache very often (and I confess I do rm -rf cache instead of emcc --clear-cache). But I don't run the whole test suite that often either, so the library building time was not a big bottleneck for me personally.

Not sure how many people use EMCC_CORES, but if that's a problem, maybe we can make this as an option that can be switched off or is switched off by default?

dschuff · 2025-03-12T17:26:32Z

The real problem with this as written is that it actually sort of breaks EMCC_CORES (even more than the limitations it already has). With a global lock, the one holder of the lock can build a library using N cores (where N is EMCC_CORES), and if no other process needs the lock, you could have N more concurrent processes (the test runner uses EMCC_CORES or the build system is set up however the user wants it, which probably doesn't take library building into account). So you'd likely be limited to at most N2 processes. With per-file locks you could in theory have each of the N tests or user builds each building a library, for NN cores.

In practice this is of course unlikely (especially for a user build, which likely only does one link at a time and probably doesn't need multiple variants of the larger libraries like libc). But it's at least a possibility. Given that limitation, maybe I'm over-thinking though?

sbc100 · 2025-03-12T17:47:10Z

What if we instead pushed a but harder on embuilder as the way to build more than one library at a time? And try to move away from auto-building libraries (at least for emscripten developers). embuilder can then assume it has a global lock and build the world in parallel.

dschuff · 2025-03-12T18:09:46Z

Embuilder actually already does a pretty good job of building the world in parallel, if EMCC_USE_NINJA is set, because it accumulates all the libraries into one ninja build. Its main limitation right now is that ports don't work that way yet (I didn't include them when I initially set up the Ninja support for system libs because they are built differently; I don't remember whether there was something inherently more difficult about the ports system or whether I just stopped because it was good enough).
Ports though are actually a big part of the bottlenecking that this PR solves, since they are often slower to build than system libs, so it might be worth it. I was actually just looking at that yesterday as a part of the effort to make the build more efficient overall; currently building ports is the biggest remaining source of inefficiency of the build-linux step because each port is built sequentially rather than merging them all together, so the bot spends a decent part of its time running just one core. sqlite3 is a big part of that because it's one file that takes 40s to build and we build it twice, with no parallelism. Building ports like sqlite3 and poppler can also bottleneck the test runner if a fast small system lib build like compiler-rt gets blocked behind something like sqlite3; that's what this PR fixes.

sbc100 reviewed Mar 7, 2025

View reviewed changes

dschuff added 5 commits March 12, 2025 16:27

Prototype for per-file cache locks

6a56a4c

lock with shortname

415b962

mangle names for lock files

a5f29ff

use separate build dir for pic/wasm64/lto variants; cleanup

6cdfa91

Fix cache clearing (correctly preserve the lock directory)

75aab16

dschuff force-pushed the perfile-lock branch from bed35e6 to 75aab16 Compare March 12, 2025 16:42

dschuff mentioned this pull request Mar 20, 2025

Use a separate build directory for each port #23961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental per-file cache locking #23854

Experimental per-file cache locking #23854

dschuff commented Mar 6, 2025

dschuff commented Mar 6, 2025

dschuff commented Mar 6, 2025

sbc100 commented Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

sbc100 Mar 7, 2025

dschuff commented Mar 7, 2025

kripken commented Mar 7, 2025

dschuff commented Mar 7, 2025

sbc100 commented Mar 7, 2025

dschuff commented Mar 7, 2025

kripken commented Mar 8, 2025

aheejin commented Mar 8, 2025

dschuff commented Mar 12, 2025

sbc100 commented Mar 12, 2025

dschuff commented Mar 12, 2025

		@@ -185,16 +188,27 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False)
		return str(cachename)


		def setup_file(cache_file):

Experimental per-file cache locking #23854

Are you sure you want to change the base?

Experimental per-file cache locking #23854

Conversation

dschuff commented Mar 6, 2025

dschuff commented Mar 6, 2025

dschuff commented Mar 6, 2025

sbc100 commented Mar 7, 2025

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

sbc100 Mar 7, 2025

Choose a reason for hiding this comment

dschuff commented Mar 7, 2025

kripken commented Mar 7, 2025

dschuff commented Mar 7, 2025

sbc100 commented Mar 7, 2025

dschuff commented Mar 7, 2025

kripken commented Mar 8, 2025

aheejin commented Mar 8, 2025

dschuff commented Mar 12, 2025

sbc100 commented Mar 12, 2025

dschuff commented Mar 12, 2025