Algebraic Data Types, or ADTs for short, are a core feature of functional languages such as OCaml or Haskell. They are a handy model of closed disjoint unions and unfortunately, outside of the functional realm, they are only seldom used.

In this article, I will explain what ADTs are, how they are used in OCaml and what trimmed-down versions of them exist in other languages. I will use OCaml, but the big picture is about the same in Haskell.

Functional languages offer a myriad of types for the programmer.

- some
*base*types, such as`int`

,`char`

or`bool`

. - functions, ie
*arrow*types. A function with domain`a`

and codomain`b`

has type`a -> b`

. - tuples, ie
*product*types. A tuple is an heterogeneous, fixed-width container type (its set-theoretic counterpart is the cartesian product) For example,`(2, true, 'x')`

has type`int * bool * char`

.*record*types are a (mostly) syntactic extension to give name to their fields. - some
*parametric*types. For example, if`t`

is a type,`t list`

is the type of homogeneous linked list of elements having type`t`

. - what we are talking about today,
*algebraic*types (or*sum*types, or*variant*types).

If product types represent the cartesian product, algebraic types represent the disjoint union. In another words, they are very adapted for a case analysis.

We will take the example of integer ranges. One can say that an integer range is either :

- the empty range
- of the form
`]-∞;a]`

- of the form
`[a;+∞[`

- an interval of the form
`[a;b]`

(where a ≤ b) - the whole range (ie, ℤ)

With the following properties :

- Disjunction : no range can be of two forms at a time.
- Injectivity : if
`[a;b]`

=`[c;d]`

, then`a`

=`c`

and`b`

=`d`

(and similarly for other forms). - Exhaustiveness : it cannot be of another form.

This can be encoded as an ADT :

`Empty`

, `HalfLeft`

, `HalfRight`

, `Range`

and `FullRange`

are `t`

’s *constructors*. They are the only way to build a value of type `t`

. For example, `Empty`

, `HalfLeft 3`

and `Range (2, 5)`

are all values of type `t`

^{1}. They each have a specific *arity* (the number of arguments they take).

To *deconstruct* a value of type `t`

, we have to use a powerful construct, *pattern matching*, which is about matching a value against a sequence of patterns (yes, that’s about it).

To illustrate this, we will write a function that computes the minimum value of such a range. Of course, this can be ±∞ too, so we have to define a type to represent the return value.

In a math textbook, we would write the case analysis as :

- min ∅ = +∞
- min ]-∞;a] = -∞
- min [a;+∞[ = a
- min [a;b] = a
- min ℤ = -∞

That translates to the following (executable !) OCaml code :

```
let range_min x =
match x with
| Empty -> PlusInfinity
| HalfLeft a -> MinusInfinity
| HalfRight a -> Finite a
| Range (a, b) -> Finite a
| FullRange -> MinusInfinity
```

In the pattern `HalfLeft a`

, `a`

is a variable name, so it get bounds to the argument’s value. In other words, `match (HalfLeft 2) with HalfLeft x -> e`

bounds `x`

to 2 in `e`

.

Pattern matching seems magical at first, but it is only a syntactic trick. Indeed, the definition of the above type is equivalent to the following definition :

```
type range
(* The following is not syntactically correct *)
val Empty : range
val HalfLeft : int -> range
val HalfRight : int -> range
val Range : int * int -> range
val FullRange : range
(* Moreover, we know that they are injective and mutually disjoint *)
val deconstruct_range :
(unit -> 'a) ->
(int -> 'a) ->
(int -> 'a) ->
(int * int -> 'a) ->
(unit -> 'a) ->
range ->
'a
```

`deconstruct_range`

is what replaces pattern matching. It also embodies the notion of exhaustiveness, because given any value of type `range`

, we can build a deconstructed value out of it.

Its type looks scary at first, but if we look closer, its arguments are a sequence of case-specific deconstructors^{2} and the value to get “matched” against.

To show the equivalence, we can implement `deconstruct_range`

using pattern patching and `range_min`

using `deconstruct_range`

^{3} :

```
let deconstruct_range
f_empty
f_halfleft
f_halfright
f_range
f_fullrange
x
=
match x with
| Empty -> f_empty ()
| HalfLeft a -> f_halfleft a
| HalfRight a -> f_halfright a
| Range (a, b) -> f_range (a, b)
| FullRange -> f_fullrange ()
```

```
let range_min' x =
deconstruct_range
(fun () -> PlusInfinity)
(fun a -> MinusInfinity)
(fun a -> Finite a)
(fun (a, b) -> Finite a)
(fun () -> MinusInfinity)
x
```

After this trip in denotational-land, let’s get back to operational-land : how is this implemented ?

In OCaml, no type information exists at runtime. Everything exists with a uniform representation and is either an integer or a pointer to a block. Each block starts with a tag, a size and a number of fields.

With the `Obj`

module (kids, don’t try this at home), it is possible to inspect blocks at runtime. Let’s write a dumper for `range`

value and watch outputs :

```
(* Range of integers between a and b *)
let rec rng a b =
if a > b then
[]
else
a :: rng (a+1) b
let view_block o =
if (Obj.is_block o) then
begin
let tag = Obj.tag o in
let sz = Obj.size o in
let f n =
let f = Obj.field o n in
assert (Obj.is_int f);
Obj.obj f
in
tag :: List.map f (rng 0 (sz-1))
end
else if Obj.is_int o then
[Obj.obj o]
else
assert false
let examples () =
let p_list l =
String.concat ";" (List.map string_of_int l)
in
let explore_range r =
print_endline (p_list (view_block (Obj.repr r)))
in
List.iter explore_range
[ Empty
; HalfLeft 8
; HalfRight 13
; Range (2, 5)
; FullRange
]
```

When we run `examples ()`

, it outputs :

```
0
0;8
1;13
2;2;5
1
```

We can see the following distinction :

- 0-ary constructors (
`Empty`

and`FullRange`

) are encoded are simple integers. - other ones are encoded blocks with a constructor number as tag (0 for
`HalfLeft`

, 1 for`HalfRight`

and 2 for`Range`

) and their argument list afterwards.

Thanks to this uniform representation, pattern-matching is straightforward : the runtime system will only look at the tag number to decide which constructor has been used, and if there are arguments to be bound, they are just after in the same block.

Algebraic Data Types are a simple model of disjoint unions, for which case analyses are the most natural. In more mainstream languages, some alternatives exist but they are more limited to model the same problem.

For example, in object-oriented languages, the Visitor pattern is the natural way to do it. But class trees are inherently “open”, thus breaking the exhaustivity property.

The closest implementation is tagged unions in C, but they require to roll your own solution using `enum`

s, `struct`

s and `union`

s. This also means that all your hand-allocated blocks will have the same size.

Oh, and I would love to know how this problem is solved with other paradigms !

Unfortunately, so is

`Range (10, 2)`

. The invariant that a ≤ b has to be enforced by the programmer when using this constructor.↩For 0-ary constructors, the type has to be

`unit -> 'a`

instead of`'a`

to allow side effects to happen during pattern matching.↩More precisely, we would have to show that any function written with pattern matching can be adapted to use the deconstructor instead. I hope that this example is general enough to get the idea.↩

into this one (the top-level type `'a list -> int`

is usually what is interesting but the compiler has to infer the type of every subexpression):

```
let rec length : 'a list -> int = function
| [] -> (0 : int)
| (x:'a)::(xs : 'a list) -> (1 : int)
+ ((length : 'a list -> int) (xs : 'a list) : int)
```

Because the compiler does so much work, it is reasonable to wonder whether it is efficient. The theoretical answer to this question is that type inference is EXP-complete, but given reasonable constraints on the program, it can be done in quasi-linear time (*n* log *n* where *n* is the size of the program).

Still, one may wonder what kind of pathological cases show this exponential effect. Here is one such example:

```
let p x y = fun z -> z x y ;;
let r () =
let x1 = fun x -> p x x in
let x2 = fun z -> x1 (x1 z) in
let x3 = fun z -> x2 (x2 z) in
x3 (fun z -> z);;
```

The type signature of `r`

is already daunting:

```
% ocamlc -i types.ml
val p : 'a -> 'b -> ('a -> 'b -> 'c) -> 'c
val r :
unit ->
(((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
'd) ->
'd) ->
((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
'd) ->
'd) ->
'e) ->
'e
```

But what’s interesting about this program is that we can add (or remove) lines to study how input size can alter the processing time and output type size. It explodes:

n | wc -c | time | leaves(n) |
---|---|---|---|

1 | 98 | 15ms | 1 |

2 | 167 | 15ms | 2 |

3 | 610 | 15ms | 8 |

4 | 11630 | 38ms | 128 |

5 | 4276270 | 6.3s | 32768 |

Observing the number of `('a -> 'a)`

leaves in the output type reveals that it is is squared and doubled at each step, leading to an exponential growth.

In practice, this effect does not appear in day-to-day programs because programmers annotate the top-level declarations with their types. In that case, the size of the types would be merely proportional to the size of the program, because the type annotation would be gigantic.

Also, programmers tend to write functions that do something useful, which these do not seem to do ☺.

]]>*Full disclosure:* it’s actually 2017, but I started writing this in 2016 so it’s OK. Also I’m not actually from the US, but I’ll relax the definition a bit and let’s pretend it means International Bot Making Year. Close enough!

Bots are all the rage - Twitter bots, IRC bots, Telegram bots… I decided to make a Slack bot to get more familiar with that API.

I wanted this to be a small project - write and forget, basically. I started by defining some specs and lock those down:

- that bot works on Slack
- it uses the “will it rain in the next hour” API from Météo France.
- the bot understands 3 commands:
- tell you whether it will rain or not.
- show you a graph of rain level over the next hour.
- tell you when to go out to avoid the rain.

The next step was choosing the tech stack. For hosting itself I was sold on using Heroku from previous projects (or another PaaS host, for what it’s worth)

As for the programming language itself, I hesitated between three choices:

- focus on the all-included experience: something that has libraries, tooling, but somehow boring;
- focus on the shipping experience: stuff that I use daily, but looking to get something online quickly;
- focus on learning something new.

The first one means something like Python or Ruby. I am familiar with the languages and am pretty sure that there are libraries that can take care of the Slack API without me having to ever worry about HTTP endpoints. That means also first-class deployment and hosting.

The second one is about OCaml: it’s a programming language I use daily at work, but the real goal would be to focus on shipping: create a project, write tests, write implementation, deploy, repeat for new features, forget.

The third one means a totally new programming language. I heard a lot of good things about Elixir for backend applications and figured that it would be a good intro project. Learning a new language is always an interesting experience, because it makes you a better programmer in all languages, and having clear specs would make this manageable.

The Python/Ruby solution seemed a bit boring. I probably would not learn a lot, only, maybe add a couple libraries to my toolbelt at most.

Elixir sounds great, but learning a new language and a new project at the same time is too hard and too time consuming. I would rather write in a new language something I previously wrote in another language. Though for something small and focused like this, that could have worked.

I first created the project structure: github repo, ocaml project (topkg, opam, etc). I like to use TDD for this kind of projects, so I added a small alcotest suite. I also created the 12factor separation: a `Procfile`

, a small `bin/`

shell that reads the application configuration from the environment and starts a bot from `lib/`

.

I asked myself what to test: the cohttp library is nice, because servers and clients are built using normal functions that take a request and returns a response. That makes it possible to test almost everything at the ocaml level without having to go to the HTTP level. This is especially important since there is no way to mock values and functions in ocaml. Everything has to be real objects.

However, even if it was possible to test everything, I decided to just focus on the domain logic without testing the HTTP part: for example, I would pass data structures directly to my bot object rather than building a cohttp request.

A part that is important for me even for a small project like that, is to have some sort of CI: have travis run my test suite, and make a binary ready to be deployed to Heroku. That way, it is impossible to forget how to make changes, test and deploy, since this is all in a script.

The other part that needed work is the actual Slack integration. The “slash” command API is pretty simple: it is possible to configure a Slack team such that typing `/rain`

will hit a particular URL. Some options are passed as `POST`

data and whatever is returned is displayed in Slack.

I set up the Slack integration, wrote a function to distinguish between `/rain`

and `/rain list`

(using the POST data), and by the end of the second iteraton I had my second feature implemented, working, and deployed.

All in all, that was pretty great. The code or the bot itself are not particularly fantastic, but I learned some important lessons:

- When you do not want to spend a lot of time on a task, invest in planning and keep the list of features short. That is pretty obvious in the context of paid work, but this is applies well to hobby programming too.
- Know what to test and what not to. Tests are useful to ensure that changes can be made without breaking everything, but testing that your HTTP library can parse POST data is a waste of time.
- In languages where it is not possible to mock or monkey patch functions, dependency injection is still possible. One may even argue that it leads to a better solution, since it removes the coupling between the different components.

You can find the source of this bot on Github. See you next year, #NaBoMaMo! And thanks to Tully Hansen for organizing this.

]]>