-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
BFloat16.jl support in kernels #2441
Labels
cuda kernels
Stuff about writing CUDA kernels.
Comments
Update: looks like we hit a selection error now julia> using CUDA, BFloat16s
julia> function foobar(C::AbstractArray, a::Number, b::Number)
@inbounds C[] = a*b
return
end
foobar (generic function with 1 method)
julia> @cuda foobar(CuArray(Float64[0]), one(BFloat16), one(Int32))
ERROR: LLVM error: Cannot select: 0x22563be0: f64 = fp_extend 0x22563b70, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:210 @[ number.jl:7 @[ /home/tim/Julia/pkg/CUDA/src/device/array.jl:166 @[ /home/tim/Julia/pkg/CUDA/src/device/array.jl:178 @[ REPL[3]:2 ] ] ] ]
0x22563b70: bf16 = fmul 0x22563b00, 0x22563320, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:227 @[ promotion.jl:430 @[ REPL[3]:2 ] ]
0x22563b00: bf16 = sint_to_fp 0x22563390, /home/tim/.julia/packages/BFloat16s/u3WQc/src/bfloat16.jl:188 @[ number.jl:7 @[ promotion.jl:375 @[ promotion.jl:400 @[ promotion.jl:430 @[ REPL[3]:2 ] ] ] ] ]
0x22563390: i32,ch = load<(dereferenceable invariant load (s32) from `i32 addrspace(101)* null`, addrspace 101)> 0x20cf8dc0, TargetExternalSymbol:i64'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_3', undef:i64
0x22563940: i64 = TargetExternalSymbol'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_3'
0x22563010: i64 = undef
0x22563320: bf16,ch = load<(dereferenceable invariant load (s16) from `bfloat addrspace(101)* null`, addrspace 101)> 0x20cf8dc0, TargetExternalSymbol:i64'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_2', undef:i64
0x225637f0: i64 = TargetExternalSymbol'_Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32_param_2'
0x22563010: i64 = undef
In function: _Z6foobar13CuDeviceArrayI7Float64Ll1ELl1EE8BFloat165Int32 |
I tried a few high level operations, the current situation is this: using BFloat16s, CUDA
CUDA.allowscalar(false)
A = rand(BFloat16, 3, 3) |> cu
x = rand(BFloat16, 3) |> cu
A * x # ok
tanh.(x) # ok
x .* x # ERROR: LLVM error: Cannot select: 0x12b83210: bf16 = fmul 0x12b839f0, 0x18ddf5c0 ...
x.^2 # ERROR: LLVM error: Cannot select: 0x163e2e30: bf16 = fmul 0xf042050, 0xf042050 .... My environment is the following
|
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Julia 1.11 introduces BFloat16 codegen support, so let's use this issue to track support for that.
Right now, it looks like we support the type, but somehow still emit conversions:
In addition, the logic in BFloat16s.jl isn't great, as we determine support based on the host processor. It's not clear if we can do better though; this looks a lot like the literal
Int
issue (where we can't make GPU code useInt32
when the host isInt64
).The text was updated successfully, but these errors were encountered: