Skip to content

Extend embuilder deferred building mode to ports #23924

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

dschuff
Copy link
Member

@dschuff dschuff commented Mar 14, 2025

This allows embuilder build ALL to mix the ports builds together with the
system_libs builds, for even faster build-linux speed.
Headers are still installed eagerly during ports.get() (to allow dependencies to
build their object files) but does not run ninja to build the ports until
the explicit step at the end.

dschuff added 2 commits March 13, 2025 18:34
This version requires threading the 'deferred' flag through each port definition.
@@ -36,6 +36,7 @@

logger = logging.getLogger('ports')

build_deferred = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global variable is ugly but it allows this approach to work without having to edit all the ports definition files. The alternative would be to thread the deferred argument through the get() of each port and its calls of ports.build_port.

@@ -177,8 +177,6 @@ def get(shortname, creator, what=None, force=False, quiet=False, deferred=False)
logger.info(message)
utils.safe_ensure_dirs(cachename.parent)
creator(str(cachename))
if not deferred:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this assertion is also a consequence of localizing the change to init.py.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is generic to all caching so it seems unfortunate not to be able to assert that the result of a build actually creates the file. Can we revert this part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can; I thought it was slightly ugly to thread deferred as an argument through all the port generators so I removed it, but since we put in the env var, we can just use that instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this still isn't great since we are saying that we don't care about the result of any cache access when EMBUILDER_PORT_BUILD_DEFERRED is set.

I guess its better than not checking at all though.

Are you sure we need this, given that port.get is not longer called from add_cflags at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this also affects non-port system libs builds (it's why this assertion was already conditional and not enabled with deferred builds before this PR). If we wanted something equivalent for deferred builds, we'd have to put code somewhere (here, presumably) to accumulate the list of deferred files we eventually want to see in the cache and verify that they all exist after the ninja invocation finishes. We could maybe introduce an API like cache.verify_deferred or something like that.

@@ -16,8 +16,6 @@ def needed(settings):


def get(ports, settings, shared):
sdl_build = os.path.join(ports.get_build_dir(), 'sdl2')
assert os.path.exists(sdl_build), 'You must use SDL2 to use SDL2_gfx'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work when the actual library build is deferred, but the dependency code in ports/init.py should ensure that the dependency is met.

@dschuff
Copy link
Member Author

dschuff commented Mar 15, 2025

This does seem to work now, and cut about 4 minutes off of the build-linux stage on the critical path. seems like it might be worthwhile to consider?

@dschuff dschuff marked this pull request as ready for review March 15, 2025 00:15
sbc100 added a commit to sbc100/emscripten that referenced this pull request Mar 17, 2025
sbc100 added a commit to sbc100/emscripten that referenced this pull request Mar 17, 2025
sbc100 added a commit to sbc100/emscripten that referenced this pull request Mar 19, 2025
sbc100 added a commit to sbc100/emscripten that referenced this pull request Mar 19, 2025
sbc100 added a commit that referenced this pull request Mar 19, 2025
@@ -499,7 +499,7 @@ jobs:
- build-libs
- run:
name: Clean build directory
command: rm -rf ~/cache/build
command: rm -rf ~/cache/build ~/cache/ports-builds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this directory cache/build/ports instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way on that. After #23932 they are all just together under /build in their own uniquely-named directory, which seems fine to me.

tools/cache.py Outdated
@@ -129,6 +129,10 @@ def get_lib_dir(absolute):
return path


def get_lib_subdir():
return get_lib_dir(False).relative_to(Path(get_sysroot(False), 'lib'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do this the other way around? i.e put the subdir calculation logic here in this function and then have get_lib_dir call get_lib_subdir?

def get_lib_dir(absolute):                                                                               
  ensure_setup()                                                                                         
  libroot = Path(get_sysroot(absolute=absolute), 'lib') 
  return Path(libroot, get_lib_subdir()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed in the current version.

dschuff added a commit to dschuff/emscripten that referenced this pull request Mar 20, 2025
This allows building them simultaneously with Ninja.
Split out from emscripten-core#23924
dschuff added a commit that referenced this pull request Mar 21, 2025
This allows building them simultaneously with Ninja.
Split out from #23924
@@ -578,9 +584,13 @@ def add_cflags(args, settings): # noqa: U100

# Now get (i.e. build) the ports in dependency order. This is important because the
# headers from one ports might be needed before we can build the next.
global build_deferred
assert not build_deferred, "add_cflags shouldn't be called from embuilder"
build_deferred = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are the deferred ports actually built?

If we defer the building of the port does when only compiling does that mean that the port headers will be installed N times if we do emcc -c -sUSE_SDL2 for each source file, and then once again at link time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred libs and ports are only built when using embuilder.py. It explicitly invokes ninja (via system_libs.build_deferred()) after all the calls to port.build.
emcc never builds anything deferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this code does nothing outside of embuilder?

Is it still needed? I find the assertion a little strange. If add_cflags is never called from embuilder then why are we needing to set build_deferred here (which means nothing outside of embuilder).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, so here's the issue. The headers for any dependencies need to be in place in order to build object files. So (as the comment on top of the function suggests), this can normally cause those dependencies to be built to get the headers in place. When building with embuilder we want all the object files to be deferred and built by the single invocation of ninja at the end (i.e. we don't want the invocations of emcc that build the object files to cause additional object files and archives to be built). But emcc is in a different process from embuilder, so it doesn't have access to the build_deferred global in the embuilder process. So what was happening was that some of the libraries were getting built twice (once as a result of add_cflags inside an invocation of emcc, and once from the top-level ninja file).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this use of build_deferred is for emcc usage and not for embuilder usage?

So emcc always installs headers in build_deffered mode now?

Doesn't that mean that if I compile 1000 source files I will get the headers for any ports use installed 1000 separate times? Since none of the compile steps actually build the library (due to being deffered) each compile will run as if the library is completely missing and go ahead and install the headers, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally it doesn't seem like cache lock contention is too much of an issue (after the port is unpacked, which only happens ones) but I guess the create function is called under the lock, so the header check happens with the lock. Another env variable would certainly do the trick. It could be scoped to skipping ports as you suggested, or it could be used more generally to mean that we are running an embuilder library build. It coudl just replace the build_deferred global variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its too gross for me I think.

At the very least we should somehow ensure this behaviour change is only for embuilder. As it stands this would effect all USE_NINJA=1 users. How about a separate USE_DEFFERED_BUILD=1 or something like that that embuilder can set?

Copy link
Member Author

@dschuff dschuff Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was initially not thrilled about another env var, but the fact that it can replace all the uses of this global is nice. I'll go with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a little more. Using the env var in place of the global var here does make it a bit nicer. But I realized that we don't actually need that in order to keep the headers from being installed redundantly. we just need to omit the ports.get for the dependency ports under embuilder. It's nicer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works; WDYT?

@@ -280,6 +281,9 @@ def main():
if auto_tasks:
print('Building targets: %s' % ' '.join(tasks))

if USE_NINJA:
os.environ['EMBUILDER_PORT_BUILD_DEFERRED'] = '1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its not currently possible use ninja in non-deferred mode? I guess thats OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ninja still works in non-deferred mode. If you do EMCC_USE_NINJA=1 test/runner.py core0 it will build the libraries on-demand in non-deferred mode with ninja. Deferred mode only makes sense when using embuilder though.

port.get(Ports, settings, shared)
# When using embuilder, don't build the dependencies
if not os.getenv('EMBUILDER_PORT_BUILD_DEFERRED'):
port.get(Ports, settings, shared)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will work either since what if I do emcc -sUSE_SDL=2 -c foo.c.. in this case I would want at least the headers to be installed at this point/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, in that case EMBUILDER_PORT_BUILD_DEFERRED would never be set...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. when not using embuilder, everything will build non-deferred on demand, one at a time, just like what currently happens.

@dschuff
Copy link
Member Author

dschuff commented Apr 4, 2025

Any more thoughts or questions on this one?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants