-
Notifications
You must be signed in to change notification settings - Fork 125
Improve reporting errors up through the stack #500
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
|
L0 already uses UR_BACKEND_SPECIFIC_ERROR consistently. The pi_cuda and pi_hip mostly don't use it. Once the PI is ported to UR it might be nice if UR did something like this in the cuda/hip cases: https://github.com/intel/llvm/pull/8303/files#diff-7525901710934f7bdb2ad36238c4b67163f112d3bd233db7af0b0078b5b01e80R5920 So a suggestion is to go through and find all places where e.g. cu driver functions are used, record what they return as
after all such calls. If that makes sense. Then we could clean up the PI removing all other error reporting machinary like The format for setPluginSpecificMessage is open for suggestions. We could make it a bit more verbose like this: https://github.com/intel/llvm/blob/484cf252246a958b089a8e94e35b14bd791a213c/sycl/plugins/cuda/pi_cuda.cpp#L181 |
Since I've not managed to get round to creating a full PR yet, here it is in shortform: -ur_result_t urGetLastResult(
+ur_result_t urPlatformGetLastError(
hPlatform,
const char** ppMessage,
+ const int32_t *pError
); |
As I remember each memory provider has a |
Yes, but get_last_error only returns a string describing the error and in some cases (like for L0 adapter, where we know that we are using an L0 memory provider) we might need to get the actual error status (as an int). |
Remove the `urGetLastResult()` entry-point and replace it with `urPlatformGetLastError()`. This primary difference is the addition of the `pError` out parameter which returns an error code emitted from a failed driver entry-point which resulted in a Unified Runtime entry-point returning `UR_RESULT_ERROR_ADAPTER_SPECIFIC`. Fixes oneapi-src#500.
Remove the `urGetLastResult()` entry-point and replace it with `urPlatformGetLastError()`. This primary difference is the addition of the `pError` out parameter which returns an error code emitted from a failed driver entry-point which resulted in a Unified Runtime entry-point returning `UR_RESULT_ERROR_ADAPTER_SPECIFIC`. Fixes oneapi-src#500.
Currently UR adapters make a best effort to map driver specific errors to
ur_result_t
enumerations and return those to the parallel language runtime on top of UR. This approach is problematic as it obscures details about how an adapter is using a driver from the user, details which are very often necessary to determine how to resolve a given error condition.This purpose of this issue is to track proposed solutions, the decision making process, and to determine the next steps to enable improved error reporting.
This is blocking progress on the design of #68 and is related to #471.
The text was updated successfully, but these errors were encountered: