-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Can an MLGraphBuilder be reused? #567
Comments
Side info: the discussion in #303 on the relationship between context, builder and graph may be relevant, notably comment, comment, comment, and (mainly) comment. When getting familiar with this spec, I also ran a few circles around use cases vs the relationships above, some of my doubts answered (e.g. in #302). While the current API shape does have its rationale, not all discussions have been closed, and we might want to record the relevant design decisions as well. I am following this up (on reusing builder for multiple graphs). |
Relevant tidbit:
|
Further observations:
|
- Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (webmachinelearning#566) - The impact of MLGraphBuilder re-use (webmachinelearning#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (webmachinelearning#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes webmachinelearning#448, fixes webmachinelearning#457, fixes webmachinelearning#552.
🤔 Well certainly most of the time you have one model and one unique set of weights related to that model, but there may be interesting preprocessing cases that could benefit from a
That is a nice property to have.
I can foresee cases where graphs are generated programmatically, and requiring all inputs to always be consumed could be an annoyance. I've definitely seen unused weight inputs in models before (arguably they should have been cleaned out to save space), and if you have apps that try to straight-forwardly map that model's contents to WebNN calls, it will be a bother for the app to add a lot of extra uniqueness book-keeping just to avoid an error that doesn't give the app much actionable information anyway - the app isn't going to modify the model and send email to the model's author that there was unused content in it 😉, and if after building all nodes for WebNN only to realize none of the nodes included that input, the app can't rewind and un-input the input. Such diagnostic information would be more useful for people calling the API directly, but the audience for that is pretty slim, kinda like how for WebGPU when displaying a 3D model it's not expected that you would call WebGPU API's directly, but rather WebGPU is an implementation to display existing model assets, be they FBX, COLLADA DAE, Wavefront OBJ... |
* Content: Define build() steps more rigorously - Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (#566) - The impact of MLGraphBuilder re-use (#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes #448, fixes #457, fixes #552. * Update index.bs Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com> * Build collection of input operands, and simplify steps * Populate operand and operator sets too --------- Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>
There is another one I mentioned in #614 . Today when ONNXRuntime WebNN EP inferencing transformer decoders, like Whisper, because WebNN doesn't support If operator, there will be two WebNN sub-graphs being built. One with past Key Value (KV) cache ("with_past") and another one without past KV cache ("no_past"). The inference code will run the "no_past" sub-graph for the first iteration and run the "with_past" sub-graph for the following iterations. The two sub-graphs actually share some common weights. It would be useful if the same weights being built by |
This use case makes sense but it assumes that implementations are able to optimize the case where multiple graphs share operands (or at least constants). The Chromium implementation so far is not capable of this but I can see ways in which it could be however it does somewhat depend on the underlying platform framework as well. This will have to be an area of future design exploration but given the potential it seems reasonable to at least continue to allow builder reuse even if implementations aren't yet taking advantage of the optimization potential. One thing to consider however is whether an implementation could require all sub-graphs to be built at the same time rather than piecemeal. It seems optimal for implementations which have to convert to another format (e.g. a TFLite or Core ML model) to be able to do this once rather than needing to support building on additional graphs after the fact. |
An additional thought I had while discussing this with @a-sully is that if an implementation did try to allow constants to be efficiently reused by multiple graphs but didn't require those graphs to be built at the same time then the builder would need to continue to own a copy of the constant data until it is freed instead of being able to pass ownership of that memory to the underlying platform. This could cause this attempt at memory optimization to backfire. |
If a builder allows to bind MLBuffer to a constant, constant MLBuffer proposal, would that solve this issue? |
There are still a number of open questions regarding passing MLBuffers as constants, but (depending on how we end up defining it!) yes it could solve the issue of needing to perpetually keep a copy of the data around. Taking a step back, there are at least four distinct problems I see with
I think these discussions of (2), (3), and (4) are best left for another issue, where we can discuss other ideas such as allowing fetched data to be streamed into I'll note that the Chromium implementation hid problem (1) for a long time by not actually copying Other than sharing weights across graphs (which, again, I think the current |
While it doesn't seem impossible for implementations to support building multiple
MLGraph
instances out of a singleMLGraphBuilder
instance (potentially reusing subsets of the graph by choosing different inputs and outputs) it seems like it adds complexity to implementations to support.Are there any known use cases for
MLGraphBuilder
reuse?The text was updated successfully, but these errors were encountered: