-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Consider adding node label
s for more diagnosable error messages for async errors.
#585
Comments
Could the |
@zolkis that's an interesting idea, can you elaborate more with some examples? |
Well, IIUC the example above has the use case to annotate ops with labels, to help developers figure out what went wrong when exceptions are thrown. If that is the main use case, boilerplate code is needed for manually adding a label at every single level. When the label is not specified, there is no information in the exception. Instead of manually setting explicit labels, annotations (implicit labels) could be added automatically by the build algorithm, which knows where in the compute graph it currently is. When an exception occurs, the internal label like "step1:matmul" (standard names to be defined) could be passed. The example is changed just in that the labels are not developer-injected, but internally generated (as will be specified by an eventual future algorithm). const builder = new MLGraphBuilder(context);
const A = builder.input('0', operandType);
const B = builder.input('1', operandType);
const C = builder.matmul(A, B);
const D = builder.add(A, C);
// ... keep building a complex graph
...
const finalOperand = builder.add(E, F);
// Build the graph.
const graph = await builder.build({'output': finalOperand});
> Uncaught DOMException: Model graph build error: [Operand "step1:matmul"] input dimensions XXX exceed supported limit. IOW, the labels could be owned and attached by the implementation. The advantage is that this covers all graphs at all level automatically, the disadvantage is the lack of developer-given labels (and no possibility to piggy-back other instrumentation). However, the whole process is under the control of the implementation, no worries about sanitizing/checking developer injected labels. I am not even sure we must standardize the name space for such labels, as this could be owned by the implementations (when it's only meant for human eyes) -- unless programmatic handling of that information is needed. On the other hand, if there is experience and positive developer feedback on the label feature in WebGPU (citations needed), I have no objections using it also in WebNN. |
Thanks for explaining - I like this idea. Basically in A few notes:
|
I support this, looks like the best way to go. |
label
to allow better error handling for async errors.label
s for more diagnosable error messages for async errors.
thanks @zolkis ! auto-instrumentation would be indeed less work for developers. The thing I was concerned is whether the system can add meaningful enough labels. The more complex a model gets to, the less useful is something like a step number(as in your example). Imagine a transformer model, the developers would probably namespace the labels to something like: Allowing developers to specify labels, and fallback to system label seems more plausible. As for WebGPU label usage, I don't have exact stats to point to, but I did consult our WebGPU team and they mentioned developers find the labels extremely useful. |
Right, developers can add labels - but when they don't, does it mean they don't want any other information, i.e. should we ditch the auto-generation? Or set an option for that? |
For auto-generated information, we have a couple options:
option 2 will look like:
It seems that if the user agent can add useful annotation to the error message they can just always add them. So I'd prefer option 2? |
Related, we're currently trying to diagnose some Yolo-V9 slice issues, and the lack of diagnostic info is impeding investigation ( |
I may have said this in a WG telecon, but this feels like something where a prototype implementation would help inform the spec. So if any of the Chromium contributors who are feeling the pain want to hack something in, don't wait on spec discussions and it doesn't need to be perfect! Let's iterate and learn. |
This CL 5492314: WebNN: inital implementation of Add label for mloperand | https://chromium-review.googlesource.com/c/chromium/src/+/5492314 attempts to add label for MLOperand to report more detailed
One question about the IDL definition. If a sequence of operands were returned when invoking |
Another proposal: the label also could be added into MLOperator.
Any thoughts? |
Agree! From our experience of debugging the graph translated from frameworks like onnxruntime, the operator/node name is very useful to match the operator in the .onnx model with the implemented operator in backends of WebNN. |
We can have both of them: operand name and operator name.
|
The downside with adding to the MLOperator is that it doesn't exist as a concept in the spec, so we will need to add the param to each of the builder method. It also makes the param list longer. Alternative A - add to
|
Considering Joshua's comment, alternative A seems to fit the best (to handle MLOperand's immutability vs labels mutability). The parameter / options-dict can be optional. We also need an algorithm for generating good enough default labels (impl. specific, but as per the comment above, we should rather standardize the namespace before devs start to parse them and come up with various private namespaces). But if labels are used frequently, then from a developer coding perspective, I'd prefer the solution with setting the labels on separate lines, like in alternative B, as it allows separating code instrumentation from business logic. If we could find a mean to correctly do this, I'd go with that. |
This 5528797: WebNN: initial implementation of adding name for MLOperator | https://chromium-review.googlesource.com/c/chromium/src/+/5528797 attempts to add label for MLOperator to report more detailed
|
If we could only pick one to have labels (nodes or edges), I also prefer node names (https://chromium-review.googlesource.com/c/chromium/src/+/5528797) over edge names (https://chromium-review.googlesource.com/c/chromium/src/+/5492314). However, I noticed some models have no node names, only edge names. So having the true edge names makes looking for a match in the original graph as easy as Ctrl+F in tools like Netron: Mingming commented in the CR that we could generate edge names from the node names, but if we think that generating edge names is useful, then being able to pass the actual edge names is even more useful. Though, if we think reporting the node name and WebNN parameter name suffices (e.g. “conv2d” operator and its “filter” parameter), then we need neither explicit edge labels nor implicitly generated edge labels. |
🤔 @lisa0314 It could be helpful in the POC to include at least one more operator that doesn't already have an options dictionary, like partial interface MLGraphBuilder {
MLOperand add(MLOperand a, MLOperand b, optional MLLabelOptions options = {});
...
}; |
@fdwr Good point! I will add one more operator which doesn't have any options in the POC CL. Thanks! |
@fdwr thanks for thinking through how it works with onnx. |
@philloooo: It might help some, but then DML also supports names on nodes anyway: struct DML_OPERATOR_GRAPH_NODE_DESC
{
IDMLOperator* Operator;
_Field_z_ _Maybenull_ const char* Name; <<<<<<<<
};
So I'm seeing these missing node names really with older ONNX models, whereas all of the more recent conversions/exports I've looked through have node names, and although they are technically still optional, in practice, they appear to be present.
Given the above, I'm content with node labels only (and if we found some more value to edge labels in the future, it would be a simple non-breaking addition). |
Agree! Thanks for the discussion! 👍 |
This CL attempts to add label to MLOperator to report more detailed error message for prelu and resample2d. And other operators will be supported in a following separated CL. The related spec issue is under discussion- webmachinelearning/webnn#585. Bug: 1273291 Change-Id: I4880e48eaa6c203bf5428b0672c73ca2beb8c76c Cq-Include-Trybots: luci.chromium.try:win11-blink-rel,mac14.arm64-blink-rel,mac14-blink-rel, linux-blink-rel Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5528797 Reviewed-by: Phillis Tang <phillis@chromium.org> Commit-Queue: Phillis Tang <phillis@chromium.org> Reviewed-by: ningxin hu <ningxin.hu@intel.com> Cr-Commit-Position: refs/heads/main@{#1322314}
This adds an internal 'label' property to the operators that are created as a graph is constructed, which MAY (in the RFC 2119 sense) be used by implementations in async error messages. Developers populate this via a 'label' member in the options dictionary for MLGraphBuilder methods. A new MLOperatorOptions dictionary is defined, and all existing options dictionaries now inherit from this, and all relevant methods now take an options dictionary. Fixes webmachinelearning#585
* Add optional operator labels for more diagnosable error messages This adds an internal 'label' property to the operators that are created as a graph is constructed, which MAY (in the RFC 2119 sense) be used by implementations in async error messages. Developers populate this via a 'label' member in the options dictionary for MLGraphBuilder methods. A new MLOperatorOptions dictionary is defined, and all existing options dictionaries now inherit from this, and all relevant methods now take an options dictionary. Fixes #585 * Add note encouraging implementations * Revise note to mention sync errors * Update index.bs Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * don't pass options twice --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
As a follow up of #572 , we propose platform specific validations should be done during the async
build
step.This poses a challenge for developers: they submitted a complex graph and one step within the graph is failing a platform specific check, it's hard to trace back the specific operand in the graph the error is about.
I propose to follow WebGPU’s practice to define a MLObjectBase with a
label
field to let MLOperand extend from.The usage would be like:
The
MLObjectBase
could also be extended by:MLBuffer
to help with debugging async buffer related errors.MLGraph
to help with debugging async errors from chained inference.The text was updated successfully, but these errors were encountered: