-
Notifications
You must be signed in to change notification settings - Fork 0
0002: Enhanced llvmcall
This is a complement to Julep 0001: Enhanced static compilation and C interface.
The current implementation of llvmcall
is very useful, but has a number of usability issues:
- Not statically compilable
- The
FunctionMover
pass is fundamentally invalid - The parsing rules are based on awkward (and inaccurate) string interpolation
- Parser failures aren't reported until runtime (even though the content is required to be constant)
- LLVM Types don't transform correctly
- Significant duplication with ccall (functionality and implementation), but much less robust, less tested, and less flexible
The previous Julep introduced llvmcall
as a way of emitting direct calls to llvm intrinsics. The intent of this Julep is to expand that support to allow the user to call any arbitrary llvm::Function&
, and declare other arbitrary global llvm::GlobalValue&
and llvm::Constant&
values. The intent also is to provide full serialization support for these.
See Julep 0001.
The basic idea is to introduce a new API for working with LLVM objects from the Julia level. Note that this structure hides the runtime-dependent pointers behind a stable index so that this value can be serialized and hashed easily (and it can be immutable), and looked up quickly. This also allows deleting the source bitcode. It depends on a Julia module being the unit of incremental compilation, but that is already true and I don't expect that to change.
baremodule LLVM
using Base
"""
LLVM.Module
Represents a handle to a LLVM Module.
"""
immutable Module
owner::Core.Module # each module holds its own bitcode-to-pointer table
index::UInt # pointer into the owner module bitcode table
hash::UInt128 # used for verification
let no_constructor end
end
"""
LLVM.eval(bitcode)::Module
Creates a handle to a *new* copy of an LLVM Module,
added to the bitcode table for `current_module()`,
and compiled into the current process.
All of the globals defined in this module are required to have normal (not weak) linkage.
Any name defined therein that conflicts with an existing name will be privately altered
to avoid conflicting definitions from overwriting each other in the compiler.
"""
function eval(bitcode::Vector{UInt8}) end
"""
LLVM.Module""::Module
Declares a new LLVM Module, and returns a handle to it.
"""
macro Module_str(asm::String)
return eval(asm)
end
"""
LLVM.Function(::LLVM.Module, name::String)
Represents a handle to a Function inside a LLVM Module.
It is retrieved from the module by using the `name` as the key.
"""
immutable Function
definition::Module
name::String
index::UInt
Function(m::Module, name::String) = new(m, name, get_index_of(m, name))
# note that this constructor is a pure function, so it'll constant fold naturally
end
end
Extern declared functions would be looked up in the C global name space (ignoring any functions added by our JIT). And defining functions would not add anything to the global name space. Instead, any name collisions would result in the creation of a new, locally unique name.
Additionally, there is the need to provide support for @generated
functions to generate bitcode.
The basic LLVM.eval
may not be called from an @generated
function, since it is not pure.
This means @generated
functions require a more complicated primitive than the typical case described above.
That primitive is LLVM.link
, and is a slightly altered form of LLVM.eval
:
it requires that every global either specify weak linkage or the unnamed_addr
attribute.
baremodule LLVM
"""
link(bitcode)::Module
Compiles `bitcode` and links it to the currently running C runtime.
This means it is effectively equivalent to the following pseudo-code:
run(bitcode | `llvm-as - -o - -fmt=obj` | `ld - -o bitcode -undefined dynamic_lookup`)
dlopen("bitcode")
Although, in this form, it also permits LTO in the current process.
Like `eval`, it will return a `Module` handle.
However, the handle will not necessarily be unique.
Nor will it be valid to embed the handle into the AST of a generated function,
Nor will all of the contents of the bitcode necessarily be turned into object code;
instead an existing symbol with the same name would be substituted in the place of a symbol declared with weak linkage
(even if it had different contents, per the normal rules for the system linker).
And an unnamed_addr global may also be merged with any other global with equivalent content
(even if it had a different name).
However, where this comes in useful, is that it *is* valid to embed a call to this function into the AST of a function,
such as the code returned from a generated function,
and it is reasonable to expect that the result will be constant-folded during compilation and precompiled into the module.
For example, we could describe an atomic load of an arbitrary unknown type as:
@generated atomic_load{T}(x::Ptr{T})
SIZE = sizeof(T)
bitcode = generate_load(SIZE)::Vector{UInt8}
# bitcode contains a function: "define i$SIZE @load(i$SIZE* nocapture) unnamed_addr"
return quote
M = LLVM.link($bitcode)
F = LLVM.Function(M, "load")
return ccall(F, :llvmcall, $T, (Ptr{$T},), x)
end
end
Or we could define it as:
@generated atomic_load{T}(x::Ptr{T})
bitcocde = generate_load(SIZE)::Vector{UInt8})
fname = "load$SIZE"
# bitcode contains a function: "define linkonce_odr i$SIZE @$fname(i$SIZE* nocapture)"
LLVM.link(bitcode)
return quote
return ccall($fname, :llvmcall, $T, (Ptr{$T},), x)
end
end
The first formulation is generally preferred, as it does not require the function name to be globally, universally unique.
Although the second formulation may be required in cases where you don't have strong control over the emission
of the LLVM module (such as integration with a foreign code generator which handles it's own uniquing).
Of course, avoiding `@generated` LLVM code entirely (and using `LLVM.eval`, or even `ccall` or `Core.Intrinsics`)
is even more strongly preferred, as it can be tricky to get right, is less efficient, and can be harder to debug any issues.
"""
function link(bitcode::Vector{UInt8}) end
end
Here we put LLVM's "hello world" example from the Language Reference Manual into a function and show how we can call it:
function mycall()
HelloWorld = LLVM.Module"""
; Declare the string constant as a global constant.
@.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"
; External declaration of the puts function
declare i32 @puts(i8* nocapture) nounwind
; Definition of main function
define i32 @main() { ; i32()*
; Convert [13 x i8]* to i8 *...
%cast210 = getelementptr [13 x i8], [13 x i8]* @.str, i64 0, i64 0
; Call puts function to write out the string to stdout.
call i32 @puts(i8* %cast210)
ret i32 0
}
; Named metadata
!0 = !{i32 42, null, !"string"}
!foo = !{!0}
"""
fptr = LLVM.Function(HelloWorld, "main")
return ccall(fptr, :llvmcall, Int32, ()) == 0
end
10/20/16 vtjnash: replaced LLVM.@Module_str(bitcode)
with LLVM.eval(bitcode)
10/21/16 vtjnash:
- added
LLVM.link(bitcode)
, along with a section on using it to handle the@generated
function case - removed API Option 2
- expanded documentation of the LLVM module functions
- added note of restrictions on linkage types to
LLVM.eval