Skip to content

Virtual machine design notes

Alex Wood edited this page Oct 31, 2022 · 1 revision

These are our original notes for the virtual machine. The final design is described here.

We’ll take the CLISP virtual machine as a starting point. Some differences we’d like/need to make:

  • Flat closures are probably a better representation than the linked closures used in CLISP.
  • Multiple-values will need to be handled carefully.
  • The non-local exit instructions for each special form in the VM should probably be unified to fit with the existing clasp runtime model for non-local exits.
  • All the combined instructions should be done away with for the first iteration of the bytecode virtual machine. At the very least the C++ bytecode compiler should not deal with compressing instructions.
    • All the CALLN instructions can just be implemented as one CALL N function for now - the multiple entry points vector will get indexed to appropriately by the VM.

The VM

Questions: Do we use the C/C++ call stack, our own stack(s)? How does this interact with GC and foreign calls or even calls into the normal native code compiled functions?

Representation of functions

What else besides the code vector being bytecode instructions needs to differ from the normal llvm machine code representation of functions?

Stacks, frame layout

We want to have one stack containing stack frames of Lisp byte function invocations. To make machine code => byte code calls and vice versa work, we can probably just have small trampoline wrappers to translate the call conventions at the boundary.

Consider

(defun bar (x)
  (foo x)
  3)

(defun foo (x)
  (print x))

where foo and bar are both byte coded.

Then we’d have a stack layout right before calling FOO like:

<value of X>  <= from referencing X
--------------  <= end of local variable storage area for BAR
<value of X>  <= local variable storage area
-------------- BAR base_sp <= begin local variable storage area for BAR

when calling FOO we get

-------------- <= end of local variable storage area for FOO
<value of X>  
-------------- <= FOO base_sp <= begin local variable storage area for FOO
--------------  <= end of local variable storage area
<value of X>  <= local variable storage area
-------------- BAR base_sp

then when the call to PRINT happens we call a bytecoded trampoline which will call the native function PRINT with the right arguments/entry point. Perhaps a special bytecode for doing this trampolining could be used.

Representing large operands

By default, operands for the VM instructions are (unsigned-byte 8). However, for large functions, we may want operands up to (unsigned-byte 16) in size. For these cases, we use a prefix bytecode (with the last possible opcode available, because it will be used very infrequently, and should be last in a switch-case). So

(LONG-PREFIX)

will dispatch on the next bytecode operation, but making it so that the operation now accepts an (unsigned-byte 16) operand. For uncommon (or slow) wide offset control flow constructs such as CATCH and JUMP-IF-SUPPLIED, the prefix extends the pc-relative (signed-byte 8) operand to a (signed-byte 16) operand.

Control flow instructions do not use this scheme. They have distinct opcodes for different sized labels rather than prefixes, since control targets larger than a signed byte are common and need to be fast.

Rationale

CLISP uses a variable length encoding scheme for the operands, which are decompressed on the fly during VM interpretation of the bytecode. This scheme uses the MSB of a byte as the continuation bit and sign extends.

The scheme proposed above is much faster mostly because operands do not need to be decoded and unnaturally sign-extended from the 7th bit of each byte. It also is almost always more compact. This is because we can represent the full range of 0-255 for instructions like REF, CONST without a prefix using the prefix scheme, while CLISP can only represent 0-127 because it dedicates one bit as the continuation bit. Functions rarely have more than 255 locals, and if they do, code size is clearly not a problem. (It’s a strict win up until 383 locals, because we had saved a byte already for 128-255). Hence, we speed up and make more compact 99% of the cases.

The instructions themselves

REF

Local variables get stored at the base of the current stack frame at known offsets. These can be referenced with

(REF n)

where n is the offset from the stack frame base. This has the effect of pushing the value of that local variable onto the top of the stack.

CONST

Literals get stored in the function template’s literal table/vector as with native code. We reference this vector with

(CONST n)

where n is the offset in the literal vector. Function templates (a.k.a code parts of closures) are stored in these literal vectors. This pushes the value on the top of the stack.

CLOSURE

The closure is part of the closure function object, which will be made available to the virtual machine either via the stack or by special VM registers. We use flat closures, so the bytecode instruction

(CLOSURE n)

references the nth entry in the flat closure environment and pushes the value onto the top of the stack.

CALL

(CALL n)

(CALL-RECEIVE-ONE n)

(CALL-RECEIVE-FIXED n m)

calls the function *(stack - n) with arguments *(stack - n + 1), *(stack - n + 2), ... *(stack - 1). It also indirects to the right entry point using n. After the call returns, these stack arguments appear as if they have been consumed.

For CALL, the multiple values of the call are stored in the multiple values register.

For CALL-RECEIVE-ONE, the first return value of the call is pushed to the stack. Defaulting to NIL is done if no values are returned.

For CALL-RECEIVE-FIXED, the m return values of the call are each pushed on the stack. Defaulting to NIL is done for unsupplied values.

BIND

(BIND n m)

takes the n stack top values and shoves them into the local variable region starting at index m. The n values are then popped. Having a BIND instruction allows us to not have to worry about doing location analysis to try to reference values on the stack by dead-reckoning everywhere or maintaining stack discipline. This is how we will implement LET.

The stack pointer is also ensured to be above base-pointer + m + n, if it isn’t already.

SET

(SET m)

pops the stack top and sets the frame pointer (local variable region) slot at offset m to that value.

MAKE-CELL

(MAKE-CELL)

Creates a value cell, pops the stack top and initializes the cell with that value, and puts the cell on the stack. We can use cons for this, but it would be cool to have a more compact representation.

CELL-REF

(CELL-REF)

Pops the value cell on the stack top and pushes the value inside the cell onto the stack.

CELL-SET

(CELL-SET)

Pops the value cell on the stack top, then pops a value off the stack to put in that value cell.

MAKE-CLOSURE

(MAKE-CLOSURE n)

Takes the function template at index n in the literal table of the code object and creates a flat closure with the m top values of the stack, where m is a property of the template. This closure object is then pushed onto the top of the stack. The m values are then popped in the process.

MAKE-UNINITIALIZED-CLOSURE

(MAKE-UNINITIALIZED-CLOSURE n)

Takes the function template at index n in the literal table of the code object and creates a closure object with an uninitialized environment. This uninitialized closure object is then pushed onto the top of the stack. This, along with INITIALIZE-CLOSURE, is used for implementing mutually recursive functions (i.e. CL:LABELS) without additional indirection.

INITIALIZE-CLOSURE

(INITIALIZE-CLOSURE n)

Initialize the environment of the closure at frame slot n with the top m stack top values, where m is a property of the function template. The m values are then popped in the process.

POP

(POP)

Pops a single value off the stack top and puts it into the values register as a single value.

PUSH

(PUSH)

Pushes the first value of the values register (or NIL if no values) to the stack.

RETURN

(RETURN)

Kills the current stack frame and returns from the function. The values register is kept as is.

JUMP

(JUMP <then-label>)

(JUMP-8 <then-label>)
(JUMP-16 <then-label>)
(JUMP-24 <then-label>)

Jump to <then-label> (a signed pc-relative offset).

JUMP-IF

(JUMP-IF <then-label>)

(JUMP-IF-8 <then-label>)
(JUMP-IF-16 <then-label>)
(JUMP-IF-24 <then-label>)

Pops the top of the stack, checks if the value is non-NIL, then branches to <then-label> (a signed pc-relative offset) if so.

NLX instructions

(ENTRY <n>)

Creates a dynamic environment (which captures the state of the current stack and frame pointers) and saves it in frame slot n.

(EXIT <label>)

(EXIT-8 <then-label>)
(EXIT-16 <then-label>)
(EXIT-24 <then-label>)

Pop a dynamic environment off the stack top, unwind up to that dynamic environment’s frame, and do a relative PC-jump to <label> (which will be in the same module).

(ENTRY-CLOSE)

Pop a block dynamic environment from the dynamic environment stack (not the VM stack), marking it as invalid.

(CATCH <label>)

(CATCH-8 <then-label>)
(CATCH-16 <then-label>)

Pop a tag off the stack, creating a catch frame for it in the dynamic environment. When the tag is thrown to, control resumes at <label>. There is no 24-bit label variant of this instruction because CATCH points within the function, so we probably don’t need it.

(CATCH-CLOSE)

Pop a catch frame off the dynamic environment.

(THROW)

Pop a tag off the stack, and throw to it.

Rationale

The advantage of this proposal is that the bytecode becomes significantly smaller on two axes. There is no longer an indirect index needed in GO to be specified nor the additional length argument in TAGBODY-OPEN. We’ve now deduplicated the instructions for NLX so that there are more bytecodes available.

It should also be faster on the VM side - previously we’d need to grab the IP to go to from the dynamic environment, potentially doing an indirect index to get it. The offset would be also relative to the BLOCK-OPEN or TAGBODY-OPEN, so the target would need to be calculated off of that - now we just do a pc-relative jump (fast!) in the VM directly from the bytecode being interpreted.

Obsolete NLX instructions

(BLOCK-OPEN <label>)

Create a block dynamic environment. Push it to the stack, and also to the dynamic environment stack (which is in Clasp’s general runtime, not the VM). If returned to, execution proceeds from the PC specified in <label>, with the dynamic environment stack popped to where it was upon initial execution of the block-open. We heap allocate the block tag object so that it will always be valid in e.g. a closure, for simplicity. (Note here that CLISP stack allocates the block tag object in addition to putting it in the SP stack relying on special instructions to indirectly access the tag through closure nesting. We just use flat closures instead.)

(BLOCK-CLOSE)

Pop a block dynamic environment from the dynamic environment stack (not the VM stack), marking it as invalid.

(RETURN-FROM)

Pop a block dynamic environment off the stack top, unwind up to that dynamic environment’s frame, and jump to the bytecode return PC.

(TAGBODY-OPEN <n-labels> <label>*)

Create (on the heap) a tagbody dynamic environment. Push it to the stack, and also to the dynamic environment stack. If returned to (via GO, below), execution proceeds from the PC specified by the n th tagbody label, where n is specified in GO.

(TAGBODY-CLOSE)

Pop a block dynamic environment from the dynamic environment stack (not the VM stack), marking it as invalid.

(GO <n>)

Pop a tagbody dynamic environment off the stack top, unwind up to that dynamic environment’s frame, and jump to the <n>th label specified in the TAGBODY-OPEN.

We can punt catch, throw, (maybe unwind-protect)? as usual, but having specialized instructions for them could help, as in CLISP. We’d need to worry about crossing cross machine code and bytecode tags for these instructions though, unlike for the lexical ones.

Dynamic variable instructions

Dynamic variables are common in Lisp, so it’s worth having these instructions to deal with them:

(SPECIAL-BIND <n>)

Special binds the symbol at literal offset <n> to the value at the top of the stack. The value gets popped, and the resulting binding frame is put onto the dynamic environment.

(UNBIND)

Pops one dynamic binding frame off the dynamic environment.

(PROGV)

Pops a list of values, then a list of symbols off the stack, dynamically binding the symbols to those values as if by the special operator progv, pushing the resulting binding frame onto the dynamic environment.

(SYMBOL-VALUE <n>)

Gets the symbol value of the symbol at literal offset <n>.

(SYMBOL-VALUE-SET <n>)

Sets the symbol value of the symbol at literal offset <n> to the value at the top of the stack, popping it.

Lambda list processing instructions

(CHECK-ARG-COUNT= <n>)
(CHECK-ARG-COUNT<= <n>)
(CHECK-ARG-COUNT>= <n>)

Compare arg-count to <n> according to the comparison. Error with an invalid argument count error if the comparison fails.

(BIND-REQUIRED-ARGS required-count)

Copy required-count arguments from args into frame-pointer.

(BIND-OPTIONAL-ARGS required-count optional-count)

Copy optional-count arguments from args + required-count into frame-pointer + required-count. If there are less arguments than optional-count, then set the remaining frame slots to #<unbound>.

(LISTIFY-REST-ARGS more-start)

Collects the values args[more-start]...args[arg-count -1] stack values and puts them in a list on the stack at the frame slot rest-slot. This is used to implement &rest args.

(PARSE-KEY-ARGS more-start key-count key-literal-start key-frame-start)

The high bit of key-count is used as a flag for &ALLOW-OTHER-KEYS being present in the lambda list. This bounds the number of keywords to 127, but that’s more than enough given that we probably don’t have more than 250 call arguments.

Parse the passed arguments starting from the end of the args register (i.e. args + arg-count - 1 down to args + more-start. First, initialize all values frame_pointer[key-frame-start]...frame_pointer[key-count] with #<unbound>, so that defaulting code can check for this. For every second argument KEY scanned from the right (we complain if the number of passed arguments is not even), check to see if it matches any literal key value in literals[key-literal-start]...literals[key-literal-start+key-count-1]. Let k be the offset from key-literal-start where the match was found. Then bind the supplied argument corresponding to KEY to frame_pointer[key-frame-start + k]. We process keys in reverse order in this way to satisfy the CL constraint that the first appearance of a key/value pair is the one that supplies the value to the key binding.

(JUMP-IF-SUPPLIED <n> <label>)

(JUMP-IF-SUPPLIED-8 <n> <then-label>)
(JUMP-IF-SUPPLIED-16 <then-label>)

Jump to <label> if the nth frame slot is not the unbound marker. There is no 24-bit label variant because we probably don’t need it, as this instruction will point function locally.

Examples:

(defun f (x g &optional (y (let ((z (* x g)))
                             (+ z z)))
  (+ x y g))
;; Compiler has determined the following frame slot offsets for local variables: #(X G Y Z)
 
;; Literals for F
#(#<FDEFN *> #<FDEFN +>)

;; Bytecode for F
                                      ; The passed arguments are at vm.args.
(CHECK-ARG-COUNT<= 3)                 ; Die if the vm arg count register is less than max supplied.
(CHECK-ARG-COUNT>= 2)                 ; Die if the vm arg count register is less than required.

(BIND-REQUIRED-ARGS 2)                ; Copy X and G into frame slots 0 and 1.
(BIND-OPTIONAL-ARGS 2 1)              ; Copy Y into frame slot 2 if supplied, otherwise #<unbound>            
(JUMP-IF-SUPPLIED main)               ; Go to the main entry point where all arguments have been supplied.
                                      ; Compute Y

(REF 0)                               ; X
(REF 1)                               ; G
(CALL 2)
(SET 3)                               ; BIND Z
(CONST 1)                             ; #<FDEFN +>              
(REF 3)                               ; Z
(REF 3)                               ; Z
(CALL 2)
(SET 2)                               ; BIND Y
main:                                 ; Main entry point: X G Y are at stack slots 0 1 2 at this point 
(REF 0)                               ; X
(REF 1)                               ; G
(REF 2)                               ; Y
(CONST 1)                             ; #<FDEFN +>
(CALL 3)
(RETURN)

Multiple value instructions

(PUSH-VALUES)

Push all values in the values register, then push the number of values pushed onto the top of the stack.

(APPEND-VALUES)

Like PUSH-VALUES, but pops the value n from the stack first, then pushes all values in the values register. n is then added to the number of values just pushed on the stack.

This has the effect of appending the multiple values in values with the multiple values which have already been pushed onto the stack. This is useful for implementing multiple-value-call with more than one argument form.

(POP-VALUES)

Pop the value n at the top of the stack, then pop the next n values off the stack to initialize the values register.

(MV-CALL)

(MV-CALL-RECEIVING-SINGLE)

(MV-CALL-RECEIVING-FIXED)

Pop the function at stack top, and call it with the values register as the argument.

Note

An alternative design for the MV-CALL instructions goes like this:

The MV-CALL instruction directly pops off the unknown values off the top of the stack (i.e. the number of values and then that number of values) and goes. This avoids copying into a temporary args vector in the underlying VM but adds a PUSH-VALUES for the most common case of multiple-value-call, which is with one argument form.

karlosz> You’d have to put in a PUSH-VALUES for 1 arg forms but that’s compensated by ditching a POP-VALUES on >1 arg forms and you save 1 (VLA) copy for >1 arg forms.

Additional stack manipulation instructions

The following instructions are used to optimize local GO and RETURN-FROM (that is, exits in the same function)

(SAVE-SP m)

Saves the current stack pointer in frame slot m.

(RESTORE-SP m)

Restore the current stack pointer in frame slot m.

Additional instructions.

The following are additional instructions to shorten/speed up the bytecode, in descending order of worthwhileness.

(NIL)

Pushes NIL to the stack top. We definitely want this one, as NIL is everywhere in Lisp.

(JUMP-EQ <label>)

Pops the top two stack values and compares them for EQness. Jumps to <label> if they are indeed EQ. We probably want this because EQ is so ubiquitous. (Like in &KEY processing for example).

Not as important:

(ZERO)? (T)? (probably not as useful)

(CAR)? (CDR)?

Misc notes

One additional rationale for this design (especially for BIND) is that we avoid having to compute too much info about where values are on the stack ahead of time so straight line compilation can be done. CLISP likes to keep track of where is what on the stack to potentially reduce loads and stores, but that also makes stack balancing for conditionals necessary, which is sort of a pain to do for straight line compilation, so we try to avoid that.

Bytecode example

(defun f (x)
  (let ((y 5))
    (print y)
    (lambda ()
      (+ y x))))

compiles to

;; Literals vector for code object...
#(5 #<FDEFN PRINT> #<FUNCTION (LAMBDA () :IN F)> #<FDEFN +>)

;; Bytecode for F
(BIND-REQUIRED-ARGS 1) ; Copy X into the 0th frame slot.
                       ; The stack pointer is at frame-pointer + 1, since that's the frame size.
(CONST 0)          ; Pushes 5 to stack top
(BIND 1 1)         ; Pops 5 off stack top and binds it to Y (which the compiler has determined to be at stack slot 1).

(CONST 1)          ; Push #<FDEFN PRINT> to stack top
(REF 1)            ; Push Y to stack top
(CALL 1)           ; Call PRINT with 1 argument (Y), pops both values off the stack. MV_REG = 5 after the call.

(REF 0)            ; Push X to stack top
(REF 1)            ; Push Y to stack top
(MAKE-CLOSURE 2)   ; Takes the top 2 values on the stack,
                   ; creates a closure environment and packages it up with the function template.
                   ; Pushes this object to stack top.

(RETURN)           ; Takes  all remaining values on the stack, puts it in the MV register, and returns it.

;; Bytecode for (LAMBDA () :IN F)
(CONST 3)          ; Push + on the stack.
(CLOSURE 0)        ; Reference closure value 0 (i.e. X) and push it onto top of the stack
(CLOSURE 1)        ; Reference closure value 1 (i.e. Y) and push it onto top of the stack
(CALL 2)           ; Pop the top 3 stack values and call the top with the 2 arguments.
(RETURN)           ; Stack is empty, leave the MV register as is.

Nonlocal exits

In order to work with C++ unwinding, we’ll probably need to emulate its unwinding tables. Here’s what I’ve been imagining. Our VM interpreter function does `try { …main loop over bytecodes… } catch (Unwind& uw) { […] } catch (…) { […] }` or so. The `[…]` blocks consult a secondary vector, the unwind tables. Like C++ unwind tables, these are basically intervals in the bytecode corresponding to dynamic environments. For example, the bytecodes corresponding to a `cl:block` would have one interval. The code in the `Unwind&` handler would check the unwind table, find that the instruction pointer is currently in that block, check if the exception corresponds to the current frame, bla bla bla. `unwind-protect` should work similarly but would need a presence in the generic handler as well.

Our native unwinder should be easier to integrate with. We just use setjmp/longjmp and our dynamic environment objects similarly to how the interpreter or compiled code does. Clisp’s dynamic environments also seem to be in terms of SJLJ, so the Clisp instructions might be sufficient.

VM struct definitions

struct vm {
  // Values register for storing multiple values across calls.
  T_mv values;
  // The stack.
  T_O stack[STACK_SIZE];
  // The current top of the stack pointer.
  T_O* stack_top;
  // The frame pointer for the current bytecode function frame.
  T_O* frame_pointer;
  // The current bytecode function object.
  bytecode_closure_object *closure;
  // The current literals in the running module.
  T_O* literals;
  // The number of arguments being passed.
  short arg_count;
  // The arguments being passed.
  T_O* args; 
  // The current program counter (an interior pointer) into the bytevector in the currently running module.
  unsigned char *pc;
}

// XXX: Do we want to have function templates be a separate slot/get indexed separately?
// (This might help keep the literals vector smaller for indexing in the byte code.)
// What else might the module object have besides just the literals?
struct bytecode_module {
  // The bytecode bytes in this module.
  unsigned char *bytes; // This should probably be an actual Lisp object, like a (SIMPLE-ARRAY (UNSIGNED-BYTE 8))
  // The literals table/vector. Variable length array. Function templates also get put here, perhaps as the initial segment.
  T_O literals[];
}

struct bytecode_closure {
  bytecode_function *template;
  // The actual closure environment.
  Closure_O env[];
}

// The template for a bytecode closure
struct bytecode_function {
  // A self pointer. We have this slot at the same offset as bytecode_closure.template
  // so that the same indirection can be used to get the template for both without runtime
  // type dispatch.
  bytecode_function *template;
  // This function's containing module.
  bytecode_module *module;
  // The frame size this function needs for local variables.
  unsigned short locals_frame_size;
  // Number of closure values in the environment
  unsigned short environment_size;
  // Entry point into the bytes vector in the containing module.
  // This is an interior pointer, but we might want to make it an offset
  // in the future to make dumping/loading/GC easier.
  unsigned char *entry_pc;
}

Creating bytecode-closures

Here’s how one creates a bytecode-closure

(defparameter fdesc (core:function-description/make :function-name 'foo))  ; Describes the function for backtraces
(defparameter bmodule (core:bytecode-module/make))   ; stores the bytecode vector and the literals
(defparameter gbep (core:global-bytecode-entry-point/make fdesc ; function-description
                                                          bmodule ; bytecode-module
                                                          0  ; localsFrameSize
                                                          0  ; required
                                                          0  ; optional
                                                          0  ; restSlot
                                                          0  ; keyStart
                                                          nil ; keyConsts
                                                          0   ; flags
                                                          0   ; environmentSize
                                                          '(0 0 0 0 0 0 0)))    ; entryPoint bytecode indices into bm bytecode vector
                                                                                ; order is general, arg0, arg1, ... arg5
                                                                                ; You need a different index for each entry point.
(defparameter bc (core:bytecode-closure/make gbep ; global-bytecode-entry-point
                                             5))  ; closed-over-slots

(defparameter bv (core:bytecode-module/bytecode bmodule))
(vector-push-extend 0 bv)   ; add bytecode to the bytecode-module/bytecode vector
(funcall bc) ; call the bytecode closure arg0 entry point.

(defun core:bytecode-call (pc-pointer closure args)
   ... ; write bytecode interpreter
)
(defparameter ep (core:function/entry-point bc))

Methods for accessing entry-point info (like bytecode_function)
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/ENTRY-PCS  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/ENVIRONMENT-SIZE  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/FLAGS  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/GLOBAL-ENTRY-POINT-CODE  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/KEY-CONSTS  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/KEY-START  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/LOCALS-FRAME-SIZE  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/MAKE  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/OPTIONAL  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/REQUIRED  Function
;; CORE:GLOBAL-BYTECODE-ENTRY-POINT/REST-SLOT  Function
  

CL_DEFMETHOD Fixnum Pointer_O::peekByte() const {
  unsigned char byte = *(unsigned char*)this->ptr();
  return byte;
};

CL_DEFMETHOD void Pointer_O::inPlaceIncrement(Fixnum offset) {
  this->m_raw_data = (void*)((unsigned char*)this->m_raw_data + offset);
};
Clone this wiki locally