Saturday, August 6, 2011

How I learned to stop worrying and love de Bruijn indices

When I started learning how to formalise languages in Coq, one of my initial stumbling blocks was choosing an approach to name binding. There are at least three fancy sounding contenders: Locally Nameless, Nominal Reasoning and Parametric Higher Order Abstract Syntax. Of course, you could also use good old de Bruijn indices, but proponents of fancy approaches warn about the large volume of lemmas regarding lifting and substitution you'll need to prove before you can start the "real work".

Back in April I scratched around for about two weeks reading up on all the different approaches, trying to identify the "best" one (whatever that means). Then I read the following comment by Xavier Leroy on the POPLmark website. He had used the Locally Nameless approach, but had stated: "If the purpose is just to solve the challenge and be done with it, I would rather go for pure de Bruijn". That was enough to convince me to stop worrying and just get into it. I've spent about 250 hours now doing my own proofs, and I can report that using de Bruijn isn't really that bad. Yes, you need to prove some lemmas about lifting and substitution, but my entire response to naysayers is summed up henceforth: "The only computational operation that lambda calculus has is substitution, so don't pretend you can get away without proving something about it." Corollary: HOAS doesn't count.

The Lemmas

You need to know what the lemmas are though, and I advise cribbing them from someone else's proof. I learned them from Arthur Charguéraud's proof, but I don't know where they are from originally. You're more than welcome to copy them from mine.

Once you know what the lemmas are, and how they work in STLC, extending them to new languages is straightforward. For reference, the following are the definitions of the lifting and substitution operators for STLC. Note that if you just want to prove Progress and Preservation for STLC, you don't actually need the lemmas. They're needed in bigger languages such as SystemF2 to show that type substitution is commutative. Nevertheless, for illustrative purposes we'll just stick with STLC expressions.

Inductive exp : Type :=
 | XVar  : nat -> exp
 | XLam  : ty  -> exp -> exp
 | XApp  : exp -> exp -> exp.

Fixpoint 
 liftX  (d:  nat) (* current binding depth in expression *)
        (xx: exp) (* expression containing references to lift *)
        : exp
 := match xx with 
    |  XVar ix    
    => if le_gt_dec d ix
        (* Index is referencing the env, so lift it across the new elem *)
        then XVar (S ix)

        (* Index is locally bound in the expression itself, and 
           not the environment, so we don't need to change it. *)
        else xx

    (* Increase the current depth as we move across a lambda. *)
    |  XLam t1 x1
    => XLam t1 (liftX (S d) x1)

    |  XApp x1 x2
    => XApp   (liftX d x1) (liftX d x2)
    end.

Fixpoint
 substX (d:  nat) (* current binding depth in expression *)
        (u:  exp) (* new expression to substitute *)
        (xx: exp) (* expression to substitute into *)
        : exp 
 := match xx with
    | XVar ix 
    => match nat_compare ix d with
       (* Index matches the one we are substituting for. *)
       | Eq  => u
       
       (* Index was free in the original expression.
          As we've removed the outermost binder, also decrease this
          index by one. *)
       | Gt  => XVar (ix - 1)

       (* Index was bound in the original expression. *)
       | Lt  => XVar ix
       end

    (* Increase the depth as we move across a lambda. *)
    |  XLam t1 x2
    => XLam t1 (substX (S d) (liftX 0 u) x2)

    |  XApp x1 x2 
    => XApp (substX d u x1) (substX d u x2)
 end. 

Drawing the deBruijn environment

Here is a nice lemma to get started.
Lemma lift_lift
 :  forall n m t
 ,  lift n              (lift (n + m) t) 
 =  lift (1 + (n + m)) (lift n t).
This is one of the lemmas you need for proving commutativity of substitution. Although it looks complicated at first glance, it will make significantly more sense when you know how to draw it.

First, consider the standard typing judgement form:
Env |- Exp :: Type
Assume there are de Bruijn indices in the expression. Some of these indices point to lambdas in the expression itself, corresponding to locally bound variables, while some point to lambdas in the environment, corresponding to free variables. Here is an example:



Suppose I apply (lift 2) to the above expression. Although it would be helpful for educational purposes to do this directly using the definition of 'lift' above (try it!), I'll tell you the shortcut instead. Firstly, if we assume that the definition of 'lift' is correct, none of the bound indices will be changed during the application. Ignoring bound indices, the only ones remaining are free indices. These are the indices that point into the environment.

Now, although applying 'lift' to an expression may increment these free indices, instead of thinking about indices being incremented, it's easier to think about environment elements being shifted. I'll explain what I mean by that in a moment, but for now let's just do it...

Here is the same expression as before, with the environment positions numbered in blue:


Shifting the elements after position two yields the following:



In the above diagram I've used '_' to represent a hole in the environment. This is a place where I could insert a new element, and be guaranteed that none of the indices in the expression point to this new element. This in turn means that if I apply (lift 2) to the expression, then insert a new element into the hole, then the resulting judgement is guaranteed to give me the same type as before (t5 in this case).

Of course, applying 'lift' to the expression doesn't actually change the environment. In this example I've moved 't1' and 't2' to the left for illustrative purposes, but this is an operation separate from applying 'lift' to the expression itself. However, applying 'lift' does changes the indices, and these indices are pointers into the environment. If we like, we could instead think about moving the pointers (the arrow heads) instead of the actual environment elements. This is an important change in perspective. Instead of thinking about 'lift' as incrementing indices in the expression, we want to think about 'lift' as shifting environment pointers to the left.


Let's look at our lemma again:
Lemma lift_lift
 :  forall n m t
 ,  lift m              (lift (n + m) t) 
 =  lift (1 + (n + m)) (lift n t).

As 't' is quantified, we need to prove this for all possible terms 't'. Using our change in perspective, this also means that we want to prove the lemma for all possible sets of environment pointers. Given the environment pointers for 't', the expression '(lift (n + m) t)' transforms them into a new set (by adding a hole at position n + m). Likewise the expression 'lift (1 + (n + m)) (lift n t)' transforms the environment pointers of 't' by adding a hole a position 'n', and then adding another hole at position '1 + (n + m)'. Here is a picture of the full situation. I've just drawn the contiguous set of environment pointers as rectangles, with the holes are marked in grey.


In the first line we have set of environment pointers for 't'. Applying 'lift n' adds a hole a position 'n'. Alternatively, applying 'lift (n + m)' adds a hole at position '(n + m)'. Given 'lift n t', if we add another hole at (1 + n + m) we get the last set shown. Likewise, if we take the set 'lift (n + m) t' and add a hole at position 'n' then we also get the last line.

That's it. That complicated looking lemma above can be drawn with a few rectangles, and provided the definition of 'lift' does what it's supposed to, the lemma is obviously true. For a language like SystemF2 you need about 5 lemmas concerning lifting and substitution. They're all easy to draw, and the proofs are straightforward once you understand the general approach. In the next post I'll show how to draw substitution, which works in a similar way.

Wednesday, May 25, 2011

Proofs and mutual recursion

Quick update. I'm still working on mechanising the proofs for the DDC core language in Coq. So far I've made it through Progress, Preservation and the correspondence between Big and Small step semantics for Simply Typed Lambda Calculus (STLC), System-F, System-F2, and PCF. Extending the STLC proof with polymorphism and fixpoints etc was surprisingly straightforward. It only took about 4 hours each to create the other versions of the proof.

The next qualitative hurdle appears to be handling the mutual recursion that arises when we extend the language in other ways. To add case expressions, I need to represent case alternatives. Case alternatives contain expressions, and expressions contain (lists of) alternatives.

Unfortunately, with mutually recursive definitions, it appears that Coq can't automatically generate all the induction schemes I need. I could be wrong on this though, as I'm still a Coq newbie. I've figured out that the Coq command "Combined Schemes", will make some of the schemes I might want, but it doesn't cover all of the use-cases. The problem (I think) is that although expressions are defined in terms of lists of alternatives, the lists themselves are polymorphic and aren't directly defined in terms of alternatives. For this reason, I can't convince "Combined Scheme" to make me a strong enough induction scheme, so I must write down and prove this myself.

At least, that's what I'm inferring from reading the proof of soundness for the STG machine described in a paper by Maciej Pirog and Dariusz Biernacki. I'm still working through it.

Tuesday, April 19, 2011

Falling down the naming well

Over the last month or so I've started to formalise the proofs of the DDC core language in Coq. The existing proofs were just getting too big to manage by hand, and typing them all up in Latex was seeming like a less and less fun thing to do.

Being a Coq-newbie I started with Benjamin Pierce and co's Software Foundations course notes. I had done a couple of pages of Coq tutorial before, but never tried to use it for anything real. I've found the course notes to provide a fairly smooth ride, though I got antsy during the IMP formalisation and skipped straight to the lambda calculus one. After about two weeks of evenings I was able to redo the simply typed lambda calculus (STLC) one by myself, but hit a wall trying to scale up to System-F. To cut a longer story short, I've come to appreciate (along with many others) what a massive hack capture avoiding substitution is.

I'm fast moving towards the viewpoint that the deBruijn and/or locally nameless representation is actually the "real" lambda calculus, whereas named binders are just a layer of syntactic sugar that confuses everything (and gives you cavities). I've ended up just settling on vanilla deBruijn indices for further work, mainly because the examples I've seen so far using the locally nameless representation rely on a heavy amount of Coq Ltac magic that I don't understand yet. I'm also more than willing to trade time for cleverness. By that I mean I'm willing to commit more time for a less clever solution, provided I know that the approach I take will get the job done and I won't have to start all over again using a different formalisation of binders.

Underneath all of this is an itch in my mind saying that I was already writing a perfectly good paper about the DDC core language, and I don't want to start another one right now about "How to formalise DDC Core in Coq". Anyway, I've made it through Progress and Preservation for STLC using the deBruijn representation, including all the list lemmas, and am feeling fairly good about it so far. The proofs are in the main DDC tree at http://code.ouroborus.net/ddc/ddc-head/proof/.

Tuesday, March 29, 2011

Real Time Edge Detection in Haskell


Last week we submitted a paper to ICFP about how to implement efficient parallel stencil convolutions in Haskell. This goes along with a new release of Repa that runs on GHC 7.0.3. We've extended the old array representation to support the partitioning of arrays into several regions, and rewritten the back-end to recover sharing between the computation of adjacent array elements. Coupled with a careful management of aliasing issues, this lets us implement image processing algorithms that have comparable performance to OpenCV.

Absolute performance depends on what data format we're using. Using Word8s is about 4 times slower than OpenCV because it uses SIMD intrinsics that we don't have access to from Haskell (yet!). On the other hand, using Float32s and 2 Haskell threads can run faster than single threaded OpenCV, as the parallelism makes up for the lack of SIMD instructions.

Here is a video of me demonstrating a Canny edge detector applied to a stream captured from my iSight camera. The demo uses Objective C for the GUI, and Haskell for the processing. The source builds with XCode 3.2.1 and GHC 7.0.3, though you'll need to update and run the CONFIGURE.sh script in the root dir to point it to your GHC install. There are also prebuilt versions with the GHC "+RTS -N" option baked in for two, four and six threads. They all do the same thing apart from the number of threads. You can change the RTS options in the main.m module if you want to rebuild it. On my 4-Core i7 desktop it does about 20 frames per second with no problem -- the framerate from the iSight is set dynamically by OSX depending on system load. I've tested it on 3 separate machines, but let me know if it doesn't work for you. Non-Mac users can run the edge detector over .bmp files, using the repa-canny program from repa-examples.

Monday, January 3, 2011

Papers, Proofs, PrimOps, Allocation and Strengthening

Papers
General status update: I got the reviews back for the paper I submitted to JFP on the DDC core language. The reviews were very helpful, and a huge thank you to the reviewers if you're reading this. I will endeavour to make all my own reviews as helpful.

The bottom line was the paper looked promising, but needs more motivation for the language design, and more meta-theory. The motivation is easy enough to 'port across from my thesis, so this part shouldn't take too long. For background, the JFP paper is an extension of the APLAS 2009 paper I wrote just after I finished my thesis. The original version of that paper had more motivational material, but I ended up cutting it to fit the page limit. For the JFP paper I tried to stick to the details of the core language, but I suppose not discussing the source language and wider context makes it hard to see the point of it all. There were also suggestions and questions as to the details of the language and typing rules, which I'll either fold in or explain why.

Proofs
For the meta-theory, the paper needs proofs of observational equivalence for the example optimisations, showing how the witnesses are used during optimisation. I had a proof of soundness for the type system in the appendix, and worked examples of the optimisations, but not formal proofs that the optimisations are ok. Said optimisations are just standard ones like let floating and full-laziness, but I agree that if the point of having witnesses of purity and constancy in the core language is to perform optimisations in the presence of side-effects, then we want some formal link between the witnesses and those optimisations.

Anyway, I started on the proofs over the christmas break, and have the structure of one showing that let-floating is sound (changing the order of let-bindings with non-interferring effects), though I still need to fill in some details. I'm estimating about two or three weeks work (full time equivalent) to finish the proofs and update the paper.

PrimOps
I went to a hackathon hosted by Erik last week where we worked on LLVM support for DDC. Erik's written all the code for the backend, though to push it all the way through I needed to change how primops are handled in the rest of the compiler (which is explained in Erik's blog post). I just finished that this afternoon, so we're gunning for full LLVM support in the near future. The primop handling is pretty clean now, so if we wanted to, say, add support for vector operations like MMX and SSE then this should be reasonably straightforward.

Allocation Effects
The paper reviews have also convinced me that we want explicit allocation effects in the source and core languages. One of my thesis reviewers also suggested this, though so far I've left them out because they weren't absolutely needed. Allocation effects have a slightly different flavour to world effects (like writing to the Console) and to read/writing mutable data. When two expressions have world or read/write effects then this means they can interfere in such a way that you cannot change the order of these two expressions. In contrast, you can always change the order of expressions that just allocate new objects.

However, if an expression allocates new mutable objects, then you cannot hoist it out of a lambda (apply the full-laziness transform). Hoisting an expression means it is potentially executed less often, so we're not just changing the order of two actions in time. If you want to do hoisting without information about allocation, then it's sufficient to check that the binding to be hoisted performs no world or read/write effects, and does not return mutable data. However, using allocation effects will make this reasoning clearer, and can enable more optimisations later, so I'll add them in. I started on a branch that has them, but now I've convinced myself they're easy enough to add I'm going to leave the hacking to later and focus on getting the paper finished first.

Strengthening
In the type inferencer, it turns out that strengthening more-than constraints to equality constraints during inference breaks programs that use interesting higher order functions (like with parser combinators). This isn't a problem with standard 2nd order functions like map, fold, filter etc, but only appears with 3rd order functions and higher, or with data containing functions, which is why I hadn't noticed it before. By "break" I mean the inferencer returns a bad type, so we get a panic when we try to type check the corresponding core program. We still want to do the strengthening because it removes the need for a pile of redundant effect applications in the core programs, but it should be done while converting the desugared language to core, instead of during inference, which is fairly obvious in hindsight. I've started fixing this as well, but need to adjust the type utils (like type simplification and effect masking) so they'll handle the non-strengthened forms. Another job to be left until after the paper is finished...