Description
Describe the bug
When attempting to utilise my Intel ARC GPU for training on my custom transformer model that utilises pytorch's TransformerEncoder with TransformerEncoderLayer and the nested tensor setting "src_key_padding_mask" throws the error:
NotImplementedError: Could not run 'aten::_nested_tensor_from_mask_left_aligned' with arguments from the 'XPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nested_tensor_from_mask_left_aligned' is only available for these backends: [CPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].
CPU: registered at /build/pytorch/build/aten/src/ATen/RegisterCPU.cpp:31188 [kernel]
BackendSelect: fallthrough registered at /build/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at /build/pytorch/aten/src/ATen/FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at /build/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at /build/pytorch/aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at /build/pytorch/aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at /build/pytorch/aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at /build/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradCPU: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradCUDA: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradHIP: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradXLA: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradMPS: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradIPU: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradXPU: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradHPU: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradVE: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradLazy: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradMTIA: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradPrivateUse1: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradPrivateUse2: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradPrivateUse3: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradMeta: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
AutogradNestedTensor: registered at /build/pytorch/torch/csrc/autograd/generated/VariableType_4.cpp:16976 [autograd kernel]
Tracer: registered at /build/pytorch/torch/csrc/autograd/generated/TraceType_4.cpp:13056 [kernel]
AutocastCPU: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:382 [backend fallback]
AutocastXPU: fallthrough registered at /build/intel-pytorch-extension/csrc/gpu/aten/amp/autocast_mode.cpp:45 [backend fallback]
AutocastCUDA: fallthrough registered at /build/pytorch/aten/src/ATen/autocast_mode.cpp:249 [backend fallback]
FuncTorchBatched: registered at /build/pytorch/aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:710 [backend fallback]
FuncTorchVmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at /build/pytorch/aten/src/ATen/LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at /build/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at /build/pytorch/aten/src/ATen/functorch/TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at /build/pytorch/aten/src/ATen/functorch/DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at /build/pytorch/aten/src/ATen/core/PythonFallbackKernel.cpp:157 [backend fallback]
Relevant sub-section of code:
class ModalityTransformer(nn.Module):
"""A transformer model for a single modality."""
def init(self, nhead, d_model, num_layers, dim_feedforward, dropout):
super().init()
self.transformer = nn.TransformerEncoder(
nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout, batch_first=True), num_layers)
def forward(self, x, src_key_padding_mask=None):
# If a mask is provided, it should be passed to the transformer layer
x = self.transformer(x, src_key_padding_mask=src_key_padding_mask)
return x
Versions
V2.1.1.0+XPU on Linux:Ubuntu, Python 3.11