Enter the void *

A lens-based ST20 emulator

by Etienne Millon on August 20, 2015

Tagged as: haskell, emulator, lenses.

Every year, as part of the SSTIC conference, there is a forensics/reverse engineering challenge. I participated in the 2015 edition. Though I did not manage to complete it, I made an emulator for the exotic ST20 architecture, which is probably worth describing here.

Some programs will loop. It’s OK.
Some programs will loop. It’s OK.

Note that this emulator is not really optimized for pure speed. In the actual challenge I actually had to rewrite it as pure Haskell (i.e., removing the emulation part) so that it was faster. Instead, the goal of this article is to show a few techniques to write powerful emulators in Haskell.

The evaluation monad

This program uses Template Haskell to define lenses, so unfortunately we need to start with a few type definitions.

The ST20’s memory goes from 0x80000000 to 0x7fffffff:

We’ll represent the memory using a map. The performance is surprisingly close to that of an array. It is possible to get significantly speeds up memory access by using an IOUArray but it turns loads and stores become monadic operations and makes it impossible to use lenses.

As we’ll see, transputers (hardware threads) can communicate together. We’ll be able to connect it either between them, or to a tty.

All evaluations take place in a Eval Monad which is a monad transformer stack with the following capabilities:

The above $(...) is a Template Haskell splice. It creates lenses based on the record declaration of EvalState. Lenses are a very powerful tool that makes it possible to compose record reads and updates in a functional way. Here, it defines a lens for each record field; for example, the splice expands to a top-level declaration iptr :: Lens' EvalState Address. But we will define our own lenses too, and everything will remain composable.


This is naturally adapted to byte access:

See? We composed the mem lens (between an evaluation state and a memory state) with at addr, which is a lens between a memory state and the value at address addr. Well, not exactly: at actually returns a Maybe Word8. We will assume that all memory accesses will succeed, so we want a lens that returns a plain Word8. To achieve this, we can compose with a lens that treats Maybe a as a container of a:

Sometimes we will also need to access memory word by word. To achieve that, we first define conversion functions.

Then, we can define a lens focusing on a 32-bit value.

The instruction set reference defines a handy operator to shift an address by a word offset:

It will be also handy to access the memory in list chunks:

Instruction decoding

Instructions are usually encoded on a single byte: the opcode is in the first nibble, and a parameter is in the second one. For example this is how a LDC (load constant) is encoded:

   .--- 0x40 LDC
  |.---          0x5
0x45         LDC 0x5

This only works for 4-bytes constants. To load bigger constants, there is a “prefix” operation that will shift the current operand:

   .-------- 0x20 PFX
  |.--------          0x2
  ||    .--- 0x40 LDC
  ||   |.---          0x5
  ||   ||
0x22 0x45    LDC 0x25

Those are chainable; for example 0x21 0x22 0x45 encodes LDC 0x125.

Another prefix shifts and complements the current operand value:

   .-------- 0x60 NFX
  |.--------          0x2
  ||    .--- 0x40 LDC
  ||   |.---          0x5
  ||   ||
0x62 0x45    LDC (~0x25)

The ST20 architecture actually provides two type of instructions:

We know enough to draft an instruction decoder.

Instruction decoding will need to move within the instruction stream, so it is part of the evaluation monad.

decodeInstr :: Eval Instr
decodeInstr = decodeInstr_ 0

decodeInstr_ :: Int32 -> Eval Instr
decodeInstr_ acc = do
    b <- peekAndIncr
    let acc' = acc + fromIntegral (b .&. 0xf)
    case () of
        _ | b <= 0x0f -> return $ Pri J acc'
        _ | b <= 0x1f -> return $ Pri LDLP acc'
        _ | b <= 0x2f -> decodeInstr_ $ acc' `shiftL` 4
        _ | b <= 0x3f -> return $ Pri LDNL acc'
        _ | b <= 0x4f -> return $ Pri LDC acc'
        _ | b <= 0x5f -> return $ Pri LDNLP acc'
        _ | b <= 0x6f -> decodeInstr_ $ complement acc' `shiftL` 4
        _ | b <= 0x7f -> return $ Pri LDL acc'
        _ | b <= 0x8f -> return $ Pri ADC acc'
        _ | b <= 0x9f -> return $ Pri CALL acc'
        _ | b <= 0xaf -> return $ Pri CJ acc'
        _ | b <= 0xbf -> return $ Pri AJW acc'
        _ | b <= 0xcf -> return $ Pri EQC acc'
        _ | b <= 0xdf -> return $ Pri STL acc'
        _ | b <= 0xef -> return $ Pri STNL acc'
        _             -> return $ Sec $ parseSecondary acc'

peekAndIncr :: Eval Word8
peekAndIncr = do
    addr <- use iptr
    b <- use (memByte addr)
    iptr += 1
    return b

parseSecondary :: Int32 -> SInstr
parseSecondary 0x01 = LB
parseSecondary 0x02 = BSUB
parseSecondary 0x06 = GCALL
parseSecondary 0x07 = IN
parseSecondary 0x08 = PROD
parseSecondary 0x09 = GTx
parseSecondary 0x0a = WSUB
parseSecondary 0x0b = OUT
parseSecondary 0x1b = LDPI
parseSecondary 0x1f = REM
parseSecondary 0x20 = RET
parseSecondary 0x33 = XOR
parseSecondary 0x3b = SB
parseSecondary 0x3c = GAJW
parseSecondary 0x40 = SHR
parseSecondary 0x41 = SHL
parseSecondary 0x42 = MINT
parseSecondary 0x46 = AND
parseSecondary 0x5a = DUP
parseSecondary 0xc1 = SSUB
parseSecondary b = error $ "Unknown secondary 0x" ++ showHex b ""

The two stacks

Data is manipulated using two different mechanisms: the integer stack and the workspace.

The integer stack is a set of three registers: A, B, and C, which can be used as a stack using these operations. Actually, it can only be manipulated through push and pop operations, so we represent this using a list.

The instruction set reference says that an undefined value will be popped if the stack is empty; here we consider that this will not happen, and allow a partial pattern matching.

Only the head (A) can be directly accessed, so we first define a lens between a list and its head, and compose it with intStack.

The workspace is a place in memory (pointed to by a register wptr) where local variables can be stored and loaded, a bit like a stack pointer. We first define push and pop operations.

Then we define a lens to focus on a variable.

Input and output

The main particularity of the ST20 architecture is that it has hardware support of message channels. They map fairly naturally to Control.Concurrent.Chan channels. Each ST20 thread will have a map from channel numbers to input or output channels:

And these channels can be either a Chan Word8 or a plain Handle, to connect a thread to the process’ standard input and output.

A few combinators

We first define a few combinators that will help us define the interpret function.

Pop two operands, and push the result:

Exchange two registers:

Convert a boolean to an integer:

The interpret function

The core of the interpreter is the following function. It takes an instruction and transforms it into a monadic action in Eval.

Some cases are very simple.

For some others, we can lift them into the host language and use Haskell operations.

Others need a few operations to prepare the operands and access memory.

Call and return instructions use the workspace to pass arguments.

To perform I/O, the calling transputer needs to supply three things in the int stack:

The channel itself is abstracted in the transputer’s channel maps. Most reads succeed; however the first transputer’s channel 0 will read directly from a file, so it will reach end of file at some time. We can detect that when an empty list is read, and exit the process.

The core of the interpreter is then very simple:

Several things are missing: the memory map, and how the system boots.

It turns out that the ST20 has a very simple boot protocol:

There’s some flexibility on memStart, but this value works:

Pin numbers, however, are mapped to fixed address:

We decide to initialize the memory with zeroes:

Booting a transputer is then just a matter of reading from the correct channel and doing the rest of the evaluation loop.

Multithreading boilerplate

If you fork threads and don’t wait for them, nothing will happen since the main thread will just exit. The solution is to create a “control” MVar that will be signalled to by each thread:

And to wait for all of them:

Connecting the lines

For this problem we have 13 transputers.

We devise a way to connect them together. The communication between two transputers is bidirectional, so we need two channels. Each of them is converted to an OChannel on one side and an IChannel on the other one.

Booting them is a matter of creating the correct communication channels (this pinout list comes from a schematic that was present in the challenge files).

Bonus: static analysis tools

The above transputer function is controlled by the following configuration:

It means that for each transputer, we can choose to print a graph or a disassembly of the code that will be executed. To do that, we will first compute the set of all edges in the control flow graph.

This analysis relies on a nextInstr function that statically computes the set of next instructions. These can be reached either because it’s the next one in the instruction flow (DSeq), because of jump (DJmp), or an unknown destination, for example after a RET (DDyn).

We can wrap this function in a monadic one that can turn these relative addresses into absolute ones (since it can know the addresses of functions).

Then, the algorithm consists in computing the fixpoint of the following iterating function:

The fixpoint itself is computed using the following function, which takes a predicate on two EdgeSets to stop the iteration.

We’ll stop when their size is equal.

Finally, here is how to convert the EdgeSets in a human-readable form.

For example here is an extract of the beginning of the first transputer. You can notice instructions with several destinations (conditional jumps) are displayed twice.

80000100 AJW -76
80000102 LDC 0
80000103 STL 1
80000104 LDC 0
80000105 STL 3
80000106 MINT
80000108 LDNLP 1024
8000010b GAJW
8000010d AJW -76
8000010f LDC 201
80000111 LDPI
80000113 MINT
80000115 LDC 8
80000116 OUT
80000117 LDLP 73
80000119 MINT
8000011b LDNLP 4
8000011c LDC 12
8000011d IN
8000011e LDL 73
80000120 CJ 21
80000120 CJ 21  [* 80000137]
80000122 LDC 205
80000124 LDPI

For the graph output, I assume that you have already seen graphviz output:

T03 with dot driver
T03 with dot driver

The introduction image was done using the same output but an alternative layout engines.

Hope you enjoyed this article!