-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Extend embuilder deferred building mode to ports #23924
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
This version requires threading the 'deferred' flag through each port definition.
tools/ports/__init__.py
Outdated
@@ -36,6 +36,7 @@ | |||
|
|||
logger = logging.getLogger('ports') | |||
|
|||
build_deferred = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This global variable is ugly but it allows this approach to work without having to edit all the ports definition files. The alternative would be to thread the deferred argument through the get()
of each port and its calls of ports.build_port
.
@@ -177,8 +177,6 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False) | |||
logger.info(message) | |||
utils.safe_ensure_dirs(cachename.parent) | |||
creator(str(cachename)) | |||
if not deferred: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of this assertion is also a consequence of localizing the change to init.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is generic to all caching so it seems unfortunate not to be able to assert that the result of a build actually creates the file. Can we revert this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can; I thought it was slightly ugly to thread deferred
as an argument through all the port generators so I removed it, but since we put in the env var, we can just use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this still isn't great since we are saying that we don't care about the result of any cache access when EMBUILDER_PORT_BUILD_DEFERRED
is set.
I guess its better than not checking at all though.
Are you sure we need this, given that port.get
is not longer called from add_cflags
at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this also affects non-port system libs builds (it's why this assertion was already conditional and not enabled with deferred builds before this PR). If we wanted something equivalent for deferred builds, we'd have to put code somewhere (here, presumably) to accumulate the list of deferred files we eventually want to see in the cache and verify that they all exist after the ninja invocation finishes. We could maybe introduce an API like cache.verify_deferred
or something like that.
tools/ports/sdl2_gfx.py
Outdated
@@ -16,8 +16,6 @@ def needed(settings): | |||
|
|||
|
|||
def get(ports, settings, shared): | |||
sdl_build = os.path.join(ports.get_build_dir(), 'sdl2') | |||
assert os.path.exists(sdl_build), 'You must use SDL2 to use SDL2_gfx' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work when the actual library build is deferred, but the dependency code in ports/init.py should ensure that the dependency is met.
This does seem to work now, and cut about 4 minutes off of the build-linux stage on the critical path. seems like it might be worthwhile to consider? |
.circleci/config.yml
Outdated
@@ -499,7 +499,7 @@ jobs: | |||
- build-libs | |||
- run: | |||
name: Clean build directory | |||
command: rm -rf ~/cache/build | |||
command: rm -rf ~/cache/build ~/cache/ports-builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this directory cache/build/ports
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way on that. After #23932 they are all just together under /build in their own uniquely-named directory, which seems fine to me.
tools/cache.py
Outdated
@@ -129,6 +129,10 @@ def get_lib_dir(absolute): | |||
return path | |||
|
|||
|
|||
def get_lib_subdir(): | |||
return get_lib_dir(False).relative_to(Path(get_sysroot(False), 'lib')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this the other way around? i.e put the subdir calculation logic here in this function and then have get_lib_dir
call get_lib_subdir
?
def get_lib_dir(absolute):
ensure_setup()
libroot = Path(get_sysroot(absolute=absolute), 'lib')
return Path(libroot, get_lib_subdir()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed in the current version.
This allows building them simultaneously with Ninja. Split out from emscripten-core#23924
This allows building them simultaneously with Ninja. Split out from #23924
tools/ports/__init__.py
Outdated
@@ -578,9 +584,13 @@ def add_cflags(args, settings): # noqa: U100 | |||
|
|||
# Now get (i.e. build) the ports in dependency order. This is important because the | |||
# headers from one ports might be needed before we can build the next. | |||
global build_deferred | |||
assert not build_deferred, "add_cflags shouldn't be called from embuilder" | |||
build_deferred = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When are the deferred ports actually built?
If we defer the building of the port does when only compiling does that mean that the port headers will be installed N times if we do emcc -c -sUSE_SDL2
for each source file, and then once again at link time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferred libs and ports are only built when using embuilder.py. It explicitly invokes ninja (via system_libs.build_deferred()) after all the calls to port.build.
emcc never builds anything deferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this code does nothing outside of embuilder?
Is it still needed? I find the assertion a little strange. If add_cflags is never called from embuilder then why are we needing to set build_deferred
here (which means nothing outside of embuilder).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, so here's the issue. The headers for any dependencies need to be in place in order to build object files. So (as the comment on top of the function suggests), this can normally cause those dependencies to be built to get the headers in place. When building with embuilder we want all the object files to be deferred and built by the single invocation of ninja at the end (i.e. we don't want the invocations of emcc that build the object files to cause additional object files and archives to be built). But emcc is in a different process from embuilder, so it doesn't have access to the build_deferred
global in the embuilder process. So what was happening was that some of the libraries were getting built twice (once as a result of add_cflags
inside an invocation of emcc, and once from the top-level ninja file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this use of build_deferred
is for emcc
usage and not for embuilder usage?
So emcc always installs headers in build_deffered
mode now?
Doesn't that mean that if I compile 1000 source files I will get the headers for any ports use installed 1000 separate times? Since none of the compile steps actually build the library (due to being deffered) each compile will run as if the library is completely missing and go ahead and install the headers, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally it doesn't seem like cache lock contention is too much of an issue (after the port is unpacked, which only happens ones) but I guess the create
function is called under the lock, so the header check happens with the lock. Another env variable would certainly do the trick. It could be scoped to skipping ports as you suggested, or it could be used more generally to mean that we are running an embuilder library build. It coudl just replace the build_deferred
global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its too gross for me I think.
At the very least we should somehow ensure this behaviour change is only for embuilder. As it stands this would effect all USE_NINJA=1
users. How about a separate USE_DEFFERED_BUILD=1
or something like that that embuilder can set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was initially not thrilled about another env var, but the fact that it can replace all the uses of this global is nice. I'll go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a little more. Using the env var in place of the global var here does make it a bit nicer. But I realized that we don't actually need that in order to keep the headers from being installed redundantly. we just need to omit the ports.get
for the dependency ports under embuilder. It's nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works; WDYT?
@@ -280,6 +281,9 @@ def main(): | |||
if auto_tasks: | |||
print('Building targets: %s' % ' '.join(tasks)) | |||
|
|||
if USE_NINJA: | |||
os.environ['EMBUILDER_PORT_BUILD_DEFERRED'] = '1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So its not currently possible use ninja in non-deferred mode? I guess thats OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ninja still works in non-deferred mode. If you do EMCC_USE_NINJA=1 test/runner.py core0
it will build the libraries on-demand in non-deferred mode with ninja. Deferred mode only makes sense when using embuilder though.
port.get(Ports, settings, shared) | ||
# When using embuilder, don't build the dependencies | ||
if not os.getenv('EMBUILDER_PORT_BUILD_DEFERRED'): | ||
port.get(Ports, settings, shared) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this will work either since what if I do emcc -sUSE_SDL=2 -c foo.c
.. in this case I would want at least the headers to be installed at this point/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, in that case EMBUILDER_PORT_BUILD_DEFERRED
would never be set...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. when not using embuilder, everything will build non-deferred on demand, one at a time, just like what currently happens.
Any more thoughts or questions on this one? |
This allows embuilder build ALL to mix the ports builds together with the
system_libs builds, for even faster build-linux speed.
Headers are still installed eagerly during ports.get() (to allow dependencies to
build their object files) but does not run ninja to build the ports until
the explicit step at the end.