Virtual machine design

Clasp uses a simple virtual machine for evaluation of code that doesn’t happen often, e.g. during the build process or for compile-time code. The virtual machine is designed to be reasonably fast to run while remaining easy to use as a compiler target. The compiler targeting the virtual machine, which is largely separate from the LLVM+Cleavir-based native code compiler, runs very quickly and only performs general optimizations.

The machine

The virtual machine is basically a stack machine. Each function call reserves a fixed number (determined by the compiler) of slots on the stack which are used as registers, but can use a variable amount of stack space, both to stack-allocate objects and for multiple-values purposes. Each thread of execution has its own independent virtual machine, i.e. its own VM stack space.

There are basically two intrinsic registers, beyond a function’s register file: the program counter and the multiple values vector. The multiple values vector, referred to as the mv register below, is the usual multiple values vector used also by native Clasp code.

The code is a bytecode, stored in an ordinary (unsigned-byte 8) Lisp vector. Each instruction is an optional prefix, followed by the opcode byte, followed by zero or more byte operands, depending on the particular instruction. Usually each operand is a single byte; if that is not sufficient, the instruction may be preceded by the special `long` prefix byte, which indicates that all of its operands are two bytes long instead. Two-byte operands are interpreted as little-endian sixteen bit integers. Currently, the only prefix is `long`, and it may only be used for instructions that have at least one operand.

The bytecode is organized into modules, each of which has one or more functions. Functions are in the same module if they are compiled together, which is usually because they are internal functions of one another (e.g. due to `flet` or `lambda`). Jumps are only valid within a module, as Lisp functions never `go` or `return-from` to exit points they weren’t compiled together with, by definition. Each module has zero or more constants, which are shared between all of its functions.

Each function conceptually consists of a module, an offset into the module where the function begins, and zero or more closed-over values.

“cells” are the objects Clasp uses to store mutable closure values. A cell is just a structure containing one value. They are used by both the VM and native compiled code. At present, they are cons cells, with the value in the car, and the cdr set to nil.

Instructions

The particular opcodes are listed below. FOO U U means that the instruction is called FOO, and has two operands, which are interpreted as unsigned integers. An Ln operand indicates a label; in this case it is a signed integer, and is relative to the start of the instruction (i.e. 0 would be the opcode, or the prefix if it exists). Labels may be either one, two, or three bytes long, always little-endianly encoded. Each label length has a separate opcode (e.g. there is JUMP-8 for one, JUMP-16 for two, JUMP-24 for three). The K operand type is unique to the PARSE-KEY-ARGS instruction and will be explained along with it. In the text, op1 means the first operand, and so on.

`REF U`

Push the op1-th register to the stack.

`CONST U`

Push the op1-th constant to the stack.

`CLOSURE U`

Push the op1-th closure value to the stack.

`CALL U`

Perform a function call. The most recent op1 values on the stack are the arguments for the call. The last argument is the most recent value pushed, the second to last is the second most recent, and so on. The next most recent value after the arguments is the function. Call the function with the arguments, set the mv register to be the values it returns, and then pop all 1+op1 values from the stack. (The arguments are kept on the stack during the call so that the called function can return to them without any copying taking place.)

`CALL-RECEIVE-ONE U`

As CALL, but instead of storing to mv, push the primary return value to the stack. (Note that the function being called may nonetheless write to mv.)

`CALL-RECEIVE-FIXED U U`

As CALL, but take the op2 first return values and push them to the stack, primary first.

`BIND U U`

Pop op1 values from the stack, and copy them into the register file starting at op2. The least recent stack value will go into register op2, the next least recent into 1+op2, and so on.

`SET U`

Pop a value from the stack, and set the op1-th register to that. set n is equivalent to bind n n.

`MAKE-CELL`

Pop a value from the stack. Allocate a new cell object with it as a value, and push that cell to the stack.

`CELL-REF`

Pop a value from the stack; it must be a cell. Push the cell’s value to the stack.

`CELL-SET`

Pop a value from the stack; it must be a cell. Pop another value, and set the cell to hold that value.

`MAKE-CLOSURE U`

Take the op1-th constant, which must be a prototype of a function. The prototype describes how many values the function closes over; pop this many values from the stack. Create a new closure based on the prototype, with the closed-over values being these values popped from the stack. The first closure value will be the least recently pushed value to the stack, and so on. Push the closure to the stack.

`MAKE-UNINITIALIZED-CLOSURE U`

Take the op1-th constant, which must be a prototype of a function. Create a new closure based on the prototype, with uninitialized closed over values, and push it to the stack. The closure will be initialized later by INITIALIZE-CLOSURE.

`INITIALIZE-CLOSURE U`

Take the value in the op1-th register, which must be an uninitialized closure. The closure describes how many values it closes over; pop this many values from the stack, and make them the closed over values. The first closure value will be the least recently pushed value, etc.

`RETURN`

Return from this call. The return values are those in the mv register.

`BIND-REQUIRED-ARGS U`

Copy the first op1 arguments into the first op1 registers.

`BIND-OPTIONAL-ARGS U U`

Copy the op2 arguments starting from the op1-th into the registers starting at op1. If op1+op2 is greater than the number of arguments, the registers for missing arguments are initialized to the unbound marker instead.

`LISIFY-REST-ARGS U`

Take all remaining arguments starting from the op1-th, put them into a Lisp list, and push this list to the stack.

`VASLISTIFY-REST-ARGS U`

Take all remaining arguments starting from the op1-th, and organize them into a vaslist object, which is pushed to the stack.

`PARSE-KEY-ARGS U K U U`

Parse keyword arguments. op2 is specially formatted: the most significant bit is set if the lambda list for this function has &allow-other-keys, and otherwise not. The remaining low bits are an unsigned count of how many different keywords the function accepts, the keycount. op3 is an index into the constants table; the constant must be a keyword, and the next keycount-1 constants are also keywords.

parse-key-args will iterate through the arguments starting at the op1-th, interpreting them as a plist, and load the corresponding argument values into the register file starting at the op4-th register. The registers are set to the arguments corresponding to the constant keywords, i.e. the first keyword’s value will be in the op4-th register and so on. If a keyword argument is not present, an unbound marker is placed in the corresponding register. If the length of the “plist” is odd, a program error is signaled. If an invalid keyword or non-keyword is present, and the other keys are not allowed (by &allow-other-keys or :allow-other-keys t), a program error is signaled.

`JUMP L`

Unconditional jump.

`JUMP-IF L`

Pop a value from the stack. If it is true, jump to the label, otherwise continue on.

`JUMP-IF-SUPPLIED U L`

Take the =op1=th register. If it’s the unbound marker, continue on, otherwise jump to the label.

`CHECK-ARG-COUNT-LE U`, `CHECK-ARG-COUNT-GE U`, `CHECK-ARG-COUNT-EQ U`

Signal a program error unless the argument count is less than or equal to, greater than or equal to, or equal to op1, respectively.

`PUSH-VALUES`

Push the values in the mv register to the stack. Then push a fixnum count of how many values were pushed. This instruction is used to start off gathering arguments for an MV-CALL, as well as to implement multiple-value-prog1.

`APPEND-VALUES`

Pop a fixnum count from the stack. Push the values in the mv register to the stack. The push a fixnum count of how many values were pushed plus the previously pushed count. This instruction is used to build up arguments for an MV-CALL.

`POP-VALUES`

Pop a fixnum count from the stack, then pop that many values from the stack and put them in the mv register. This is used to implement multiple-value-prog1 and for MV-CALL.

`MV-CALL`, `MV-CALL-RECEIVE-ONE`, `MV-CALL-RECEIVE-FIXED U`

Pop a value from the stack; it must be a function. Call this function using the mv register as the arguments. For MV-CALL, the return values are stored in the mv register. For MV-CALL-RECEIVE-ONE, the primary value is pushed to the stack. For MV-CALL-RECEIVE-FIXED, the op1 first values are pushed to the stack.

`ENTRY U`

Create a TagbodyDynEnv_O. Push it to the dynenv stack, and also store it in the op1-th register. The dynenv can now be used for nonlocal exits, until it is unwound to, or through, or closed by an ENTRY-CLOSE instruction.

`EXIT L`

Pop a value from the stack; it must be a TagbodyDynEnv_O. Unwind to this dynenv, i.e. unwind the dynenv and VM stacks to the state they were in just after the corresponding ENTRY instruction. Then jump to the label.

`ENTRY-CLOSE`

Close the most recent ENTRY dynenv, which must be the top of the dynenv stack.

`SPECIAL-BIND U`

The op1-th constant must be a symbol. Bind this symbol to the value popped from the stack. Push a dynenv for this binding to the dynenv stack.

`SYMBOL-VALUE U`

The op1-th constant must be a symbol. Push its value to the stack. If the symbol is unbound, signal an unbound-variable error.

`SYMBOL-VALUE-SET U`

The op1-th constant must be a symbol. Set its value to the value popped from the stack.

`UNBIND`

Close the most recent SPECIAL-BIND dynenv, which must be the top of the dynenv stack.

`FDEFINITION U`

The op1-th constant must be a function name. Look up its fdefinition and push that to the stack. If the name is unbound, signal an undefined-function error.

`NIL`

Push nil to the stack.

`PUSH`

Push the primary value from the mv register to the stack.

`POP`

Pop a value from the stack, and set the primary value of the mv register to it.

Organization of the runtime

Bytecoded functions, which run on the VM, are of class core:global-bytecode-simple-fun (GBSF). This class contains a core:bytecode-module. Note that closures will be of class core:closure as normal, and the underlying simple fun will be a GBSF. The GBSF also contains an index into the module for the start of the function’s code, and a local frame size, which indicates how large a VM register file needs to be allocated during a call - see “The machine” above for details. BGSF’s also serve as function prototypes when making closures.

The actual virtual machine structure is a core::VirtualMachine in the runtime. Each Clasp thread has a VirtualMachine object. This object keeps track of the virtual machine’s stack memory. Said stack memory is currently allocated on the C++ control stack, but we are planning on moving it to the heap in order to facilitate starting and stopping virtual machines externally.

When a GBSF is called, a trampoline function grabs the module and the start PC, and passes control to the function bytecode_call. This function, and the rest of the virtual machine implementation, live in src/core/bytecode.cc. bytecode_call allocates the register file for the function and then passes control to bytecode_vm, which does the actual interpretation of the bytecode. The values returned by bytecode_vm are returned by bytecode_call and then the function.

For the sake of nonlocal exits, bytecode_call pushes a special VMFrameDynEnv_O dynenv to the dynenv stack. This environment just records the state of the VM stack before the call. During a nonlocal exit, these dynenvs are unwound by unwinding the VM stack.

VM definition

The canonical definition of the VM opcodes is in src/lisp/kernel/cmp/bytecode-machines.lisp. During build, this file is used to generate a virtualMachine.h used by the runtime, as described in the build process page. The actual implementation of the virtual machine is in src/core/bytecode.cc as described above.

Virtual machine design

The machine

Instructions

REF U

CONST U

CLOSURE U

CALL U

CALL-RECEIVE-ONE U

CALL-RECEIVE-FIXED U U

BIND U U

SET U

MAKE-CELL

CELL-REF

CELL-SET

MAKE-CLOSURE U

MAKE-UNINITIALIZED-CLOSURE U

INITIALIZE-CLOSURE U

RETURN

BIND-REQUIRED-ARGS U

BIND-OPTIONAL-ARGS U U

LISIFY-REST-ARGS U

VASLISTIFY-REST-ARGS U

PARSE-KEY-ARGS U K U U

JUMP L

JUMP-IF L

JUMP-IF-SUPPLIED U L

CHECK-ARG-COUNT-LE U, CHECK-ARG-COUNT-GE U, CHECK-ARG-COUNT-EQ U

PUSH-VALUES

APPEND-VALUES

POP-VALUES

MV-CALL, MV-CALL-RECEIVE-ONE, MV-CALL-RECEIVE-FIXED U

ENTRY U

EXIT L

ENTRY-CLOSE

SPECIAL-BIND U

SYMBOL-VALUE U

SYMBOL-VALUE-SET U

UNBIND

FDEFINITION U

NIL

PUSH

POP