Saturday, October 16, 2010

Hacks: Type error beautification

In this series of blog posts I'll name the ones that describe what I'm currently working on after "Hacks: whatever".

I'm currently working on type error beautification. The problem is that without it you get error messages that look like this:

./test/15-Typing/Signatures/LargerEffect/Test.ds:2:8
Inferred type:
add
:: forall %rDE111 %rDE112 %rDE113
. Int32 %rDE111 -> Int32 %rDE112 -(!eDE484 $cDE487)> Int32 %rDE113
:- !eDE484 = !{!Read %rDE111; !Read %rDE112}
, $cDE487 = Test.x : %rDE111

Is bigger than signature:
add
:: forall %rDE111 %rDE112 %rDE113
. Int32 %rDE111 -> Int32 %rDE112 -(!eDE484 $cDE487)> Int32 %rDE113
:- !eDE484 = !Read %rDE111
, $cDE487 = xDE485 : %rDE111

Because !{!Read %rDE111; !Read %rDE112} does not match !Read %rDE111

That's all fine, apart from the variable names like %rDE111 and $cDE485. Those are "dummy" variables that have been invented by the type elaborator, which fills in missing effect terms in type sigs provided by the user. For example, the source program that gave the above error message is actually:

add <: Int{read} -> Int -> Int
add x y = x + y

The type signature says that the function is permitted to read its first argument, where {read} is a short-hand way of giving the real region and effect term. The elaborator takes the short-hand sigs provided by the user, then fills in the missing bits using some simple defaulting rules, before passing them to the type checker proper. Of course, this function actually reads both of its arguments, hence the error message.

The problem is that when the elaborator needs a new variable it just takes the next available name in the namespace, so the exact names you get depend on what other signatures the elaborator has already processed. This means that if you add a new signature to an imported module, the dummy variables in all the successive signatures change. That's fine for type checking, but it means that error messages presented to the user aren't consistent. It's become a problem because our regression tests compare error messages to what they were before, so now if you modify any of the base libraries, you have to accept a bunch of wibbles in the test suite.

The solution is to rewrite variables in each set of error messages to use consistent names like t0, t1, %r1, %r2 etc before presenting them to the user. Complicating factors are that if a variable was a "real" variable that came from the source program we don't want to rewrite it, and that generated/elaborated names should not collide with real names from the source program.

This is one of those PITAs you discover when you sit down and try to write a real compiler. No sexy semantics, no fancy extensions to the type system. I need to format error messages properly so my regression tests don't break after every commit... but we press onward for the glory.

No comments:

Post a Comment