`flshattF@v2.3.6` is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

kamlesh0606 · 2024-07-10T10:00:57Z

Python Version

Python 3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

Pip Freeze

absl-py==2.1.0
annotated-types==0.7.0
attrs==23.2.0
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
grpcio==1.64.1
Jinja2==3.1.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
Markdown==3.6
MarkupSafe==2.1.5
mistral_common==1.2.1
mpmath==1.3.0
networkx==3.3
numpy==1.25.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.5.82
nvidia-nvtx-cu12==12.1.105
protobuf==4.25.3
pydantic==2.6.1
pydantic_core==2.16.2
PyYAML==6.0.1
referencing==0.35.1
rpds-py==0.19.0
safetensors==0.4.3
sentencepiece==0.1.99
simple_parsing==0.1.5
six==1.16.0
sympy==1.13.0
tensorboard==2.17.0
tensorboard-data-server==0.7.2
termcolor==2.4.0
torch==2.2.0
tqdm==4.66.4
triton==2.2.0
typing_extensions==4.12.2
Werkzeug==3.0.3
xformers==0.0.24

Reproduction Steps

torchrun --nproc-per-node 1 -m train example/7B.yaml

And Get Error Like

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 8192, 32, 128) (torch.bfloat16)
key : shape=(1, 8192, 32, 128) (torch.bfloat16)
value : shape=(1, 8192, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
p : 0.0
flshattF@v2.3.6 is not supported because:
requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
bf16 is only supported on A100+ GPUs
tritonflashattF is not supported because:
requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old)
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
operator wasn't built - see python -m xformers.info for more info
triton is not available
requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
cutlassF is not supported because:
bf16 is only supported on A100+ GPUs
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128
[2024-07-10 15:30:11,478] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 206239) of binary: /opt/mistral-finetune-main/my_venv/bin/python3.10
Traceback (most recent call last):
File "/opt/mistral-finetune-main/my_venv/bin/torchrun", line 8, in
sys.exit(main())

have any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?

Expected Behavior

have any other option for Cuda - device_capability version 6.1 vision To mistral-finetune ?

Additional Context

python -m xformers.info

xFormers 0.0.24
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.6: available
memory_efficient_attention.flshattB@v2.3.6: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
sequence_parallel_fused.write_values: unavailable
sequence_parallel_fused.wait_values: unavailable
sequence_parallel_fused.cuda_memset_32b_async: unavailable
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm@0.4.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.2.0+cu121
pytorch.cuda: available
gpu.compute_capability: 6.1
gpu.name: NVIDIA GeForce GTX 1080
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1201
build.python_version: 3.10.13
build.torch_version: 2.2.0+cu121
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.24
build.nvcc_version: 12.1.66
source.privacy: open source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`flshattF@v2.3.6` is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

`flshattF@v2.3.6` is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

kamlesh0606 commented Jul 10, 2024 •

edited

Loading

bpcanedo commented Jul 12, 2024

flshattF@v2.3.6 is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

flshattF@v2.3.6 is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

Comments

kamlesh0606 commented Jul 10, 2024 • edited Loading

Python Version

Pip Freeze

Reproduction Steps

Expected Behavior

Additional Context

Suggested Solutions

bpcanedo commented Jul 12, 2024

`flshattF@v2.3.6` is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

`flshattF@v2.3.6` is not supported due to requires device with capability > (8, 0) but your GPU has capability (6, 1) (too old) #79

kamlesh0606 commented Jul 10, 2024 •

edited

Loading