← Master’s Diploma | Üetliberg →

# Generically adding position information to a datatype

Every now and then datatype fixpoints come up, especially in the context of generic programming:

newtype Fix f = In { out :: f (Fix f) }

Most explanations of this datatype I have read or heard start with this definition and then proceed to explain it, using various examples. In today’s post I will also introduce you to this datatype, but I want to take a different approach: I will show you a problem to which the Fix datatype is the natural solution, *deriving* its definition along the way.

The Haskell code in this post does not use very advanced features: there are no type functions or even type classes, only datatypes and parameters. If you are familiar with datatypes, type parameters and their syntax, it should not be hard to follow. If you have any questions, feel free to post them!

## The problem

Let’s take our trusty old friend the arithmetic expression datatype:

data BareExpr = Num Int | Add BareExpr BareExpr | Sub BareExpr BareExpr | Mul BareExpr BareExpr | Div BareExpr BareExpr

I’ve called it `BareExpr`

here for a reason: we are going to change it in such a way that we can also store position information in it, resulting in type `PosExpr`

, so that when a `PosExpr`

is produced by a parser, we can trace back where in the original source code the tree nodes were. This is useful in various applications. For example, compilers that output error messages generally provide position information about where the error occurred exactly. It is also useful in tools that need to understand text selections in the source code, such as editors that feature refactoring.

Adding position information to a single datatype is not very difficult. After we have done so for `BareExpr`

, we will look at the real problem: how to do this for *any* datatype.

## Expressions with positions

There are several ways to add position information to a datatype. In our case we will couple every *occurrence* of `BareExpr`

with a location. Let’s call the type of locations `SrcSpan`

and the annotated version of the expression datatype `PosExpr`

:

type PosExpr = (SrcSpan, PosExpr’) data PosExpr' = Num Intr’ | Add PosExpr PosExpr | Sub PosExpr PosExpr | Mul PosExpr PosExpr | Div PosExpr PosExpr

In a series of steps, we will reach our final solution.

## Step 1: Capture the similarities between BareExpr and PosExpr

`BareExpr`

and `PosExpr'`

are very similar: they both contain five constructors, and each constructor has the same number of fields. Can we capture this structure somehow? Yes, we can: the two types only differ in the types of their recursive positions, and in a very regular way. We can do here what we would do in any similar case: make the parts that differ arguments, and then express the original entities in terms of this new, general entity by providing specific arguments.

Haskell allows us to do that with datatypes: simply introduce a new type argument `r`

. We call the resulting type `ExprF`

:

data ExprF r = Num Int | Add r r | Sub r r | Mul r r | Div r r

The `F`

in `ExprF`

stands for functor, and such a datatype is usually called a *base functor*. Base functors determine the shape of the top level of a tree, but the shape of their children is determined by the type argument.

## Step 2: Express BareExpr in terms of ExprF

Now we need to recover `BareExpr`

and `PosExpr`

by expressing them in terms of `ExprF`

. For `BareExpr`

, we want the child positions of `ExprF`

also to be bare expressions. This leads to an infinite type:

BareExpr ~ ExprF (ExprF (ExprF ...))

This says that to get bare expressions back, we want to take `ExprF`

and have its children be `ExprF`

s again, and those children’s children to be `ExprF`

s again, and so on. In Haskell we can encode infinite types by introducing new datatypes (we reuse the name `BareExpr`

here):

newtype BareExpr = BareExpr { runBareExpr :: ExprF BareExpr }

If you repeatedly expand this definition, you will see that it results in the infinite type above.

## Step 3: Express PosExpr in terms of ExprF

For `PosExpr`

we can think of a similar infinite type:

PosExpr ~ (SrcSpan, ExprF (SrcSpan, ExprF ...))

Again, we write this down using a new datatype:

newtype PosExpr = PosExpr { runPosExpr :: (SrcSpan, ExprF PosExpr) }

## Step 4: Generalize BareExpr

Currently `BareExpr`

works only for the `ExprF`

shape. Let’s create such a ‘bare’ version for any shape instead of just `ExprF`

s. We can do this by making the base functor an argument:

newtype BareExpr f = BareExpr { runBareExpr :: f (BareExpr f) }

On the right-hand side, we have replaced `ExprF`

by the argument `f`

. In step 2 we supplied the type we were defining as argument to `ExprF`

; in this new version we do the same to `f`

, but since this new version has a type argument, we need to supply this argument in the recursive position as well.

But… this datatype is no longer specific for arithmetic expressions, so the name `BareExpr`

is not very appropriate. In fact, the type we have just defined is the famous `Fix`

disguised under a different name!

newtype Fix f = In { out :: f (Fix f) }

So now you know what `Fix`

does: it takes a base functor, such as `ExprF`

, and recursively applies it to itself, creating a tree that is of the same shape at every level.

Our new definition of `BareExpr`

doesn’t need to introduce any new datatypes but can now be a simple type synonym:

type BareExpr = Fix ExprF

## Step 5: Generalize PosExpr

For `PosExpr`

we can make two generalizations. The first is to not just store source locations, but allow any type of annotation:

newtype AnnExpr x = AnnExpr { runAnnExpr :: (x, ExprF (AnnExpr x)) }

The second is similar to the one we made to `BareExpr`

: have it work for any base functor instead of just `ExprF`

s:

newtype AnnFix x f = AnnFix { runAnnFix :: (x, f (AnnFix x f)) }

To recover `PosExpr`

, we give `AnnFix`

the two appropriate type arguments:

type PosExpr = AnnFix SrcSpan ExprF

## Step 6: Express AnnFix in terms of Fix

The `Fix`

type captured the idea of take a functor and applying it to itself recursively. `AnnFix`

does something similar. Can we perhaps express `AnnFix`

in terms of `Fix`

to make this explicit?

It turns out we can, if we introduce a helper datatype `Ann`

:

data Ann x f a = Ann x (f a) type AnnFix x f = Fix (Ann x f)

`Ann`

couples an annotation `x`

with a functor value. It’s kind of a tuple type, lifted to a higher order on the right side.

## Summing up

We have seen many (intermediate) definitions of datatypes, but in the end only two of them matter:

newtype Fix f = In { out :: f (Fix f) } data Ann x f a = Ann x (f a)

And of course, we have our expression example expressed in terms of these two building blocks:

type BareExpr = Fix ExprF type PosExpr = Fix (Ann SrcSpan ExprF)

With just these two building blocks, we can express generically annotated trees and unannotated trees. What is the point of generalizing this far? Well, by making these types not specific to a particular tree shape (such as `ExprF`

), you can build all sorts of tools that work on many kinds of trees. In my Masters thesis I explore this concept further, developing parser combinators that automatically insert the position information for you at the appropriate places, catamorphisms that automatically couple errors with position information, conversions between text selections and tree selections and a couple of other things.

## Read more

If you’re interested in datatype fixpoints and would like to know more, here is a collection of interesting tutorials, applications and papers:

- Mark Dominus explains data Mu f = In (f (Mu f))
- Indirect composite on the Haskell wiki
- A more formal description by Edward Kmett
- The classic 1990 Recursive types for free! by Philip Wadler
- Another classic: the 1991 Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire by Erik Meijer, Maarten Fokkinga and Ross Paterson

## Comments

AnnFix is the f-branching stream/cofree comonad (with the arguments flipped around).

I tend to like to express it with:

data f :> a = a :< f (f :> a)

it is available in Control.Comonad.Cofree in category-extras

or in Numeric.AD.Types from ‘ad’ as Stream.

You may find the following bits and pieces enlightening:

http://comonad.com/reader/2009/incremental-folds/ puts the (:>) type above to use, and calculates those annotations using f-algebras using smart constructors.

http://splonderzoek.blogspot.com/2010/03/final-push-ups-push-downs-and-passing.html is Leather, Loeh, and Jeuring’s take on the same thing. They, like you, and my code in category-extras, chose to separate the fixed point from Ann. They provide a similar incremental fold to the code in my blog, but with less type safety.

Very nice blog. I was reading Sebastiaan Visser’s msc thesis which also uses FixF for annotating recursive types. Thank you.

Thanks for the blog post (and the WGP paper is also nice). Looks like that’s exactly what I need. Are you still supporting the Annotations library?

## Leave a comment!

Martijn loves to receive comments! Add yours by filling out the fields below.