Making type inference explode
by Etienne Millon on May 21, 2014
Hindley-Milner type systems are in a sweet spot in that they are both expressive and easy to infer. For example, type inference can turn this program:
let rec length = function
0
| [] -> 1 + length xs | x::xs ->
into this one (the top-level type 'a list -> int
is usually what is
interesting but the compiler has to infer the type of every subexpression):
let rec length : 'a list -> int = function
0 : int)
| [] -> (list) -> (1 : int)
| (x:'a)::(xs : 'a list -> int) (xs : 'a list) : int) + ((length : 'a
Because the compiler does so much work, it is reasonable to wonder whether it is efficient. The theoretical answer to this question is that type inference is EXP-complete, but given reasonable constraints on the program, it can be done in quasi-linear time (n log n where n is the size of the program).
Still, one may wonder what kind of pathological cases show this exponential effect. Here is one such example:
let p x y = fun z -> z x y ;;
let r () =
let x1 = fun x -> p x x in
let x2 = fun z -> x1 (x1 z) in
let x3 = fun z -> x2 (x2 z) in
fun z -> z);; x3 (
The type signature of r
is already daunting:
% ocamlc -i types.ml
val p : 'a -> 'b -> ('a -> 'b -> 'c) -> 'c
val r :
unit ->
(((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
'd) ->
'd) ->
((((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
((((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) ->
((('a -> 'a) -> ('a -> 'a) -> 'b) -> 'b) -> 'c) ->
'c) ->
'd) ->
'd) ->
'e) ->
'e
But what’s interesting about this program is that we can add (or remove) lines to study how input size can alter the processing time and output type size. It explodes:
n | wc -c | time | leaves(n) |
---|---|---|---|
1 | 98 | 15ms | 1 |
2 | 167 | 15ms | 2 |
3 | 610 | 15ms | 8 |
4 | 11630 | 38ms | 128 |
5 | 4276270 | 6.3s | 32768 |
Observing the number of ('a -> 'a)
leaves in the output type reveals that it
is is squared and doubled at each step, leading to an exponential growth.
In practice, this effect does not appear in day-to-day programs because programmers annotate the top-level declarations with their types. In that case, the size of the types would be merely proportional to the size of the program, because the type annotation would be gigantic.
Also, programmers tend to write functions that do something useful, which these do not seem to do ☺.