-
Notifications
You must be signed in to change notification settings - Fork 372
fix: CUDA error 710 bugfix #1424
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
@bowang007 Make sure to review this |
From my perspective see nothing wrong with sampling between |
- Issue arising when compiling BERT models with 3+ inputs - Added temporary fix by decreasing the range of allowed values to the random number generator for creating input tensors to [0,2), instead of [0,5) - Used random float inputs in the range [0, 2) instead of int, then casted to desired type. The ultimate effect of this change with regard to bug pytorch#1418, is random floats are selected in the range [0, 2), then casted to Int, effectively making the range of allowed ints {0, 1}, as required by the model - More robust fix to follow
8f41d83
to
595b9f4
Compare
|
||
// Make the value range for input tensor a uniform (float) distribution | ||
// over [LoValIncl, HiValExcl), then cast to the desired dtype | ||
auto in = ((HiValExcl - LoValIncl) * at::rand(shape, {at::kCUDA}) + LoValIncl).to(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used float inputs in the range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a little bit hard-coded for this model only, but will be resolved once the input range is open to users by this RFC #1425.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Resolves a CUDA 710 error Issue arising when compiling BERT models with 3+ inputs. The issue arises due to the role of the third tensor in inference computations. Specifically, as specified in the BERT model code linked here, the third argument,$[0,1]$ . This means that when values outside of this set are used, the input is invalid.
token_type_ids
is of typetorch.LongTensor
, but can only take indices inThis becomes problematic when the inputs are, for example, indices in a dictionary or embedding - which seems to be the case here. Specifically,$[0,4]$ .
aten::embedding
is used with Tensors which are the product oftoken_type_ids
. The issue traces to one line in theshape_analysis
code previewed below, which initializes a random tensor with values in the rangeThis tensor is run through the
forward
function of the module to determine the shapes of outputs and causes the model compilation-time error, as featured here in the shape analysis code.I have added a temporary fix by decreasing the range of allowed values to the random number generator for creating input tensors to 0-1, instead of 0-4, and am working on a more robust fix.
Fixes #1418
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: