Lisp Personal Computer
A literate program for a minimal lisp virtual machine
This document is a literate program for a simple virtual machine that runs a
very minimalistic version of a lisp.
This is the result of an investigation into the intersection between Uxn (by
100rabbits) and the Lisp Machines of yore.
The core idea is that the machine is a single contiguous array of cells. Cells
are cons cells, they are 32 bit and are split in two, 16 bits + 16 bits.
In the lisp tradition we'll call the first 16 bits the CAR of the cell and the
second half the CDR of the cell.
The CAR and the CDR are each a tagged value with two possible types:
PTR(first bit is 0), points to another cellINT(first bit is 1), contains a numerical value
With this scheme it's clear that we have 15 bits for addressing and thus we have 2^15 = 32768 cells in the machine.
#define CELLCOUNT 32768
That the second type is INT doesn't mean we cannot represent other types, for
example, it's easy to see that we could restrict ourselves to the ASCII range
and using the CONS cell structure we could build strings as lists of
ASCII-range INTs.
The execution model of the code is a stack machine.
1. The Cell Array
At the core of the LPC machine is the cell array (or memory). It holds all
memory for the LPC machine with a predefined structure. Each cell is a CONS
cell, and thus has by default two pieces as explained above.
This gives us two properties:
- We don't need to reserve contiguous pieces of memory, every structure that is connected is a linked list using the structure of the memory itself. Code is linked lists, frames are linked lists, the stack is a linked list. As long as a cell is free you can allocate. This is interesting in a machine with limited memory like this.
- Unified data model. Code truly is data (and data is code if you want as well!)
The full LPC machine is basically the MACHINE_T struct.
typedef uint16_t cval_t;
typedef struct cell { cval_t a, d; } cell_t;
typedef struct machine {
cell_t memory[CELLCOUNT];
} machine_t;
We could have this as a global array, but wrapping it in a struct allows us to spawn several of these machines if needed in the future. A design decision that was made here is that the execution can be paused at any time and serialized by just saving the whole memory, so all state necessary for execution lives inside the cell memory.
There are several cells in the system that have a special meaning.
| Cell | Meaning |
|---|---|
0x0 |
Pointer to the global frame (where top-level defs live) |
0x1 |
Head of the free cell list |
0x2 |
Instruction pointer |
0x3 |
Top of the data stack |
0x4 |
Top of the return stack |
On machine initialization (for a bare image), we set the memory to zero, and then populate the free list by chaining all cells.
static inline void initialize_machine(machine_t *m) {
memset(m->memory, 0, sizeof(m->memory));
for (cellptr_t i = 5; i < CELLCOUNT - 1; i++) {
m->memory[i] = (cell_t)MKPTR(i + 1);
}
m->memory[FREEC] = (cell_t)MKPTR(4) << 16;
}
At this point we have the machine in a basic known state:
| Component | State |
|---|---|
| Globals frame | Empty (points to nil) |
| Free list | Occupies all memory (except 5 reserved cells) |
instrp |
Points to nil |
| Both stacks | Empty (point to nil) |
At this point nothing can execute, since we are pointing to nil.
2. The Cell Operations
Now we are going to talk about the main cell operations that will be pervasive
throught the code.
We'll start by creating some simple macros to deal with CAR, CDR, TAG and
tagged values.
#define TAG(x) ((x) >> 15) #define VAL(x) ((x) & 0x7FFF) #define MKPTR(a) ((uint16_t)((a) & 0x7FFF)) #define MKINT(n) ((uint16_t)(((n) & 0x7FFF) | 0x8000)) #define IS_PTR(x) (TAG(x) == 0) #define IS_INT(x) (TAG(x) == 1)
Then we have the actual core cell functions that operate on the machine, let's
start with reads. Given a cell address, get the CAR or CDR.
cval_t car(machine_t *m, cval_t ptr) { return m->mem[ptr].a; }
cval_t cdr(machine_t *m, cval_t ptr) { return m->mem[ptr].d; }
We have to have a way to construct a new cell from a CAR and a CDR, for that
we'll use the traditional CONS function. We'll look at cell_alloc after this
since it's less semantically important and more of an implementation detail.
static inline cellptr_t cons(machine_t *m, uint16_t car, uint16_t cdr) {
cellptr_t new_cell = cell_alloc(m);
m->memory[new_cell] = (car << 16) | cdr;
return new_cell;
}
We also write some helpers to set the CAR and CDR of a cell address.
void scar(machine_t *m, cval_t ptr, cval_t v) { m->mem[ptr].a = v; }
void scdr(machine_t *m, cval_t ptr, cval_t v) { m->mem[ptr].d = v; }
Finally we take a look at cell_alloc.
static inline cellptr_t cell_alloc(machine_t *m) {
cellptr_t head = VAL(car(m, FREEC));
scar(m, FREEC, cdr(m, head));
m->memory[head] = 0;
return head;
}
3. Code Representation in the LPC
After understanding cells we can now explain how code is represented. Code is
another linked list, each CAR of the list is an opcode (or opcode params) and
the CDR points to the next instruction (or param).
Now I'm going to introduce the assembly language for lpc, named MPL for now.
MPL is s-expr based and it looks a lot like a lispy Forth. It is very close to
the Instruction Set, but with some niceties.
This will also help us understand how execution works in the LPC. A simple example:
(#3 #4 + HALT)
If you are familiar with Forth this should be very easy to understand, add immediate 3 and 4 to the stack add them up and then halt execution. Internally this would be represented as a series of conses:
(LIT . (4 . (LIT . (4 . (ADD . (HALT . NIL))))))
Now let's examine an if expression in MPL
(#3 #4 < BRNCH (#100) (#200) #5 + HALT)
We push 3 and 4 to the stack and compare them, this gives us a 1 (truth value), then the if checks this value and depending on if it's a truthy value or not it will change the instruction pointer to either the list of code at the then or the else position. This will basically create four lists of code:
;; At address A (LIT . (100 . @D)) ;; At address B (LIT . (200 . @D)) ;; At address C (LIT 3 LIT 4 < BRNCH @A @B) ;; At address D (LIT 5 ADD HALT)
@D is syntax to create a pointer literal.
As you can see we encode the execution graph in the cons cells themselves so no
JMP instruction is needed. The basic follow cons cells by following the
pointer in the CDR takes care of that. This is even more exemplified by the
loop structure a loop is basically an if where the then branch folds onto
itself.
(#0 &loop-start dup #10 < BRNCH (#1 + . @loop-start) (HALT))
Several things to unpack here:
&loop-start is MPL's synax for a label and can be paired with @ to then
point to that cons cell. The . syntax allows you to set the CDR value of the
end of a list instead of the default. In this case we are setting the CDR of
the final cell of the then-branch to the loop-start. So just by following the
next cell in the chain you loop back to the start. The else-branch then
becomes what happens after the loop.
In this case we end up with three lists of code in the memory:
;; At address A, the loop body (LIT . (1 . (ADD . @(C + 2))) ;; At address B, after the loop (HALT . NIL) ;; At address C, before the loop and loop comparison (LIT . (0 . (@dup . (CALL . (LIT . (10 . (LTH . (BRNCH . (@A . (@B . NIL))))))))))
I'm going to gloss over the @dup and CALL stuff, because it's not fully
specified for now, but basically dup is in the bootstrap code, it's not an
opcode because the stack is operable using the regular cell opcodes. So what we
are doing here is calling dup. Also take note of @(C + 2), this is not real MPL
syntax, but since I'm using abstract addresses and the loop start address is
after the LIT opcode and its argument the address we are looping into is C+2.
The important takeaway from this is how control is built from two primitives, a
branch instruction that takes one of two paths, and the natural CDR following
of code execution.