Skip to content

0002: Enhanced llvmcall

Jameson Nash edited this page Oct 21, 2016 · 3 revisions

This is a complement to Julep 0001: Enhanced static compilation and C interface.

Primary issue

The current implementation of llvmcall is very useful, but has a number of usability issues:

  • Not statically compilable
  • The FunctionMover pass is fundamentally invalid
  • The parsing rules are based on awkward (and inaccurate) string interpolation
  • Parser failures aren't reported until runtime (even though the content is required to be constant)
  • LLVM Types don't transform correctly
  • Significant duplication with ccall (functionality and implementation), but much less robust, less tested, and less flexible

The previous Julep introduced llvmcall as a way of emitting direct calls to llvm intrinsics. The intent of this Julep is to expand that support to allow the user to call any arbitrary llvm::Function&, and declare other arbitrary global llvm::GlobalValue& and llvm::Constant& values. The intent also is to provide full serialization support for these.

Goals / non-goals

See Julep 0001.

Design

The basic idea is to introduce a new API for working with LLVM objects from the Julia level. Note that this structure hides the runtime-dependent pointers behind a stable index so that this value can be serialized and hashed easily (and it can be immutable), and looked up quickly. This also allows deleting the source bitcode. It depends on a Julia module being the unit of incremental compilation, but that is already true and I don't expect that to change.

baremodule LLVM
using Base
"""
    LLVM.Module

Represents a handle to a LLVM Module.
"""
immutable Module
    owner::Core.Module # each module holds its own bitcode-to-pointer table
    index::UInt # pointer into the owner module bitcode table
    hash::UInt128 # used for verification
    let no_constructor end
end

"""
    LLVM.eval(bitcode)::Module

Creates a handle to a *new* copy of an LLVM Module,
added to the bitcode table for `current_module()`,
and compiled into the current process.

All of the globals defined in this module are required to have normal (not weak) linkage.
Any name defined therein that conflicts with an existing name will be privately altered
to avoid conflicting definitions from overwriting each other in the compiler.
"""
function eval(bitcode::Vector{UInt8}) end

"""
    LLVM.Module""::Module

Declares a new LLVM Module, and returns a handle to it.
"""
macro Module_str(asm::String)
    return eval(asm)
end

"""
    LLVM.Function(::LLVM.Module, name::String)

Represents a handle to a Function inside a LLVM Module.
It is retrieved from the module by using the `name` as the key.
"""
immutable Function
    definition::Module
    name::String
    index::UInt
    Function(m::Module, name::String) = new(m, name, get_index_of(m, name))
    # note that this constructor is a pure function, so it'll constant fold naturally
end
end

Extern declared functions would be looked up in the C global name space (ignoring any functions added by our JIT). And defining functions would not add anything to the global name space. Instead, any name collisions would result in the creation of a new, locally unique name.

Support for @generated llvmcall

Additionally, there is the need to provide support for @generated functions to generate bitcode. The basic LLVM.eval may not be called from an @generated function, since it is not pure. This means @generated functions require a more complicated primitive than the typical case described above.

That primitive is LLVM.link, and is a slightly altered form of LLVM.eval: it requires that every global either specify weak linkage or the unnamed_addr attribute.

baremodule LLVM
"""
    link(bitcode)::Module

Compiles `bitcode` and links it to the currently running C runtime.
This means it is effectively equivalent to the following pseudo-code:
    run(bitcode | `llvm-as - -o - -fmt=obj` | `ld - -o bitcode -undefined dynamic_lookup`)
    dlopen("bitcode")
Although, in this form, it also permits LTO in the current process.

Like `eval`, it will return a `Module` handle.
However, the handle will not necessarily be unique.
Nor will it be valid to embed the handle into the AST of a generated function,
Nor will all of the contents of the bitcode necessarily be turned into object code;
instead an existing symbol with the same name would be substituted in the place of a symbol declared with weak linkage
(even if it had different contents, per the normal rules for the system linker).
And an unnamed_addr global may also be merged with any other global with equivalent content
(even if it had a different name).

However, where this comes in useful, is that it *is* valid to embed a call to this function into the AST of a function,
such as the code returned from a generated function,
and it is reasonable to expect that the result will be constant-folded during compilation and precompiled into the module.

For example, we could describe an atomic load of an arbitrary unknown type as:

    @generated atomic_load{T}(x::Ptr{T})
        SIZE = sizeof(T)
        bitcode = generate_load(SIZE)::Vector{UInt8}
        # bitcode contains a function: "define i$SIZE @load(i$SIZE* nocapture) unnamed_addr"
        return quote
            M = LLVM.link($bitcode)
            F = LLVM.Function(M, "load")
            return ccall(F, :llvmcall, $T, (Ptr{$T},), x)
        end
    end

Or we could define it as:

    @generated atomic_load{T}(x::Ptr{T})
        bitcocde = generate_load(SIZE)::Vector{UInt8})
        fname = "load$SIZE"
        # bitcode contains a function: "define linkonce_odr i$SIZE @$fname(i$SIZE* nocapture)"
        LLVM.link(bitcode)
        return quote
            return ccall($fname, :llvmcall, $T, (Ptr{$T},), x)
        end
    end

The first formulation is generally preferred, as it does not require the function name to be globally, universally unique.
Although the second formulation may be required in cases where you don't have strong control over the emission
of the LLVM module (such as integration with a foreign code generator which handles it's own uniquing).

Of course, avoiding `@generated` LLVM code entirely (and using `LLVM.eval`, or even `ccall` or `Core.Intrinsics`)
is even more strongly preferred, as it can be tricky to get right, is less efficient, and can be harder to debug any issues.
"""
function link(bitcode::Vector{UInt8}) end
end

Example Usage:

Here we put LLVM's "hello world" example from the Language Reference Manual into a function and show how we can call it:

function mycall()
    HelloWorld = LLVM.Module"""
        ; Declare the string constant as a global constant.
        @.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00"

        ; External declaration of the puts function
        declare i32 @puts(i8* nocapture) nounwind

        ; Definition of main function
        define i32 @main() {   ; i32()*
          ; Convert [13 x i8]* to i8  *...
          %cast210 = getelementptr [13 x i8], [13 x i8]* @.str, i64 0, i64 0

          ; Call puts function to write out the string to stdout.
          call i32 @puts(i8* %cast210)
          ret i32 0
        }

        ; Named metadata
        !0 = !{i32 42, null, !"string"}
        !foo = !{!0}
        """
    fptr = LLVM.Function(HelloWorld, "main")
    return ccall(fptr, :llvmcall, Int32, ()) == 0
end

Edit history

10/20/16 vtjnash: replaced LLVM.@Module_str(bitcode) with LLVM.eval(bitcode)

10/21/16 vtjnash:

  • added LLVM.link(bitcode), along with a section on using it to handle the @generated function case
  • removed API Option 2
  • expanded documentation of the LLVM module functions
  • added note of restrictions on linkage types to LLVM.eval

Comments