“What, another JSON library? Don’t we have enough already?”
It’s true that there are already a few JSON libraries out there. These libraries, however, require you to write fromJson
and toJson
separately.
“Uhm, yes… is that bad?”
Yes. It violates the DRY principle. If I show you an implementation of fromJson
for a certain type, you can write a corresponding toJson
without requiring any further information. Similarly, if I show you an implementation of toJson
, you can write the accompanying fromJson
. Writing down the same thing twice is tedious and opens up the possibility to make mistakes.
“But most of these libraries offer Template Haskell support that does this work for you!”
This is true, but they also make all the choices for you about how your datatypes should map to JSON. Usually they assume the names of your record fields map directly to JSON property names. The shapes of your family of datatypes need to correspond to how the objects in JSON are nested. These libraries give you the choice: either you write out fromJson
and toJson
by hand and have full control over the mapping, or you give up this control and let Template Haskell do all the work for you.
JsonGrammar gives you the best of both worlds: it gives you full control over what the mapping should be, with an API that lets you define fromJson
and toJson
at the same time. It achieves this by separating the constructing/destructing of datatype constructors and its fields from the description of the JSON values. The former is derived by Template Haskell, the latter is provided by the programmer.
Suppose we have these two datatypes describing people and their current location:
data Person = Person
{ name :: String
, gender :: Gender
, age :: Int
, lat :: Float
, lng :: Float
}
data Gender = Male | Female
Sadly, the JSON source we are communicating with is using JSON with Dutch property names and values, so we cannot use Template Haskell to derive the JSON mapping for us, like we would do with other JSON libraries. Neither do we want to use Dutch names for our record selectors; nobody would be able to understand our code anymore! Fortunately this isn’t a problem with JsonGrammar.
The first step is to have Template Haskell derive the constructor-destructor pairs:
person = $(deriveIsos ''Person)
(male, female) = $(deriveIsos ''Gender)
For the latter to work, you need to enable -XNoMonoPatBinds
.
Then we write instances of the Json
type class to define the mapping from/to Json. The order in which the properties are listed matches that of the fields in the datatype:
instance Json Person where grammar = person . object ( prop "naam" . prop "geslacht" . prop "leeftijd" . prop "lat" . prop "lng" ) instance Json Gender where grammar = male . litJson "man" <> female . litJson "vrouw"
The .
operator is from Control.Category
. The <>
is just another name for mappend
from Data.Monoid
and denotes choice.
That’s all! We have just defined both fromJson
and toJson
in one simple definition. Here’s how you can use these grammars:
ghci> let anna = Person "Anna" Female 36 53.0163038 5.1993053
ghci> let Just annaJson = toJson anna
ghci> annaJson
Object (fromList [("geslacht",String "vrouw"),("lat",Number
53.01630401611328),("leeftijd",Number 36),("lng",Number
5.199305534362793),("naam",String "Anna")])
ghci> fromJson annaJson :: Maybe Person
Just (Person {name = "Anna", gender = Female, age = 36, lat = 53.016304,
lng = 5.1993055})
The library is based on partial isomorphisms:
data Iso a b = Iso (a -> Maybe b) (b -> Maybe a)
instance Category Iso
instance Monoid (Iso a b)
A value of type Iso a b
gives you a function that converts an a
into a Maybe b
, and a function that converts a b
into a Maybe a
. This composes beautifully as a Category
. The Monoid
instance denotes choice: first try the left-hand conversion function, and if it fails, try the right-hand side.
A JSON grammar
for some type a
is nothing more than a value of type Iso Value a
, where Value
is the type of a JSON AST from the aeson package. That is, it’s a pair of conversion functions between JSON trees and your own datatype. Building JSON grammars like the one above is about composing isomorphisms that translate between intermediate types.
The isomorphisms person
, male
and female
translate between constructors and their individual fields. For example:
person :: Iso (String, Gender, Int, Float, Float) Person
Converting from a constructor to its fields might fail, because the value that is passed to the conversion function might be a different constructor of the same datatype. This is why the Monoid
instance is so useful: we can give multiple grammars, usually one for each constructor, and they will be tried in sequence. They are effectively composable pattern matches.
There is a problem with encoding the fields of such a constructor as an n-tuple: if we want to compose it with other isomorphisms that handle the individual fields, we have to use complicated tuple projections to select the fields that we’re interested in. Basically we have unwrapped the fields from one constructor only to wrap them in another one!
The solution is to use heterogenous stacks of values. They are reminiscent of continuation-passing style, because in the way we use them they usually have a polymorphic tail:
person :: Iso (String :- Gender :- Int :- Float :- Float :- t) (Person :- t)
Read :-
as ‘cons’, but then for types instead of values. Its definition is simple:
data h :- t = h :- t
The polymorphic tail says that person
doesn’t care what’s on the stack below the two Floats
; it will simply pass that part of the stack on to the right-hand side. And vice versa, if we’re working with the isomorphism in the opposite direction.
Have you thought about what the types of male
and female
would be in the non-stack versions of the isomorphisms? They don’t have any fields; we would have to leave the first type parameter of Iso
empty somehow, for example by choosing ()
. Stack isomorphisms have no such problem; we simply make the first type argument the polymorphic tail on its own, without any values on top:
male :: Iso t (Gender :- t)
female :: Iso t (Gender :- t)
Stack isomorphisms compose beautifully using .
, often without needing any special projection functions. To get a feeling for it, try compiling the example Json grammars and looking at the types of the individual components.
I lied when I wrote that grammars have type Iso Value a
; they actually use stacks themselves, too. Here is the true definition of the Json
type class:
class Json a where
grammar :: Iso (Value :- t) (a :- t)
Let’s take our Person example and make a small modification. We decide that because (lat, lng)-pairs are so common together, we’d like to put them together in their own datatype:
data Coords = Coords { lat :: Float, lng :: Float }
deriving (Eq, Show)
data Person = Person
{ name :: String
, gender :: Gender
, age :: Int
, location :: Coords
} deriving (Eq, Show)
However, in this example we have no control over the JSON format and cannot change it to match our new structure. With JsonGrammar we can express mappings where the nesting is not one-to-one:
instance Json Person where
grammar = person . object
( prop "naam"
. prop "geslacht"
. prop "leeftijd"
. coordsProps
)
coordsProps :: Iso (Object :- t) (Object :- Coords :- t)
coordsProps = duck coords . prop "lat" . prop "lng"
Here duck coords
wraps (or unwraps, depending on the direction) the two matched Float
properties in their own Coords
constructor before continuing matching the other properties in an object. Function duck
is a combinator that makes a grammar (coords
in this case) work one element down the stack. Here it makes sure the top values can remain Object
s, which is needed by prop
to build/destruct JSON objects one property at a time.
What is important to note here is that not only can we express mappings with different nestings, we can also capture this behaviour in its own grammar for reuse. JsonGrammar allows this level of modularity in everything it does.
The ideas behind JsonGrammar go back a bit. They are based on Zwaluw, a library that Sjoerd Visscher and I worked on. The library aids in writing bidirectional parsers/pretty-printers for type-safe URLs, also in a DRY manner. Zwaluw, too, uses stacks to achieve a high level of modularity. In turn, Zwaluw was inspired by HoleyMonoid, which shows that the CPS-like manner of using polymorphic stack tails allows combinators to build up a list of expected arguments for use in printf-like functionality.
The Iso
datatype comes from partial-isomorphisms and is described in more detail in Invertible syntax descriptions: Unifying parsing and pretty printing by Tillmann Rendel and Klaus Ostermann. They also use stacks (in the form of nested binary tuples), but they are not using the trick with the polymorphic tail (yet?).
Although JsonGrammar is usable, there is still work to be done:
Maybe
return values indicate whenever conversion has failed, but never how it has failed. The aeson
package gives nice error message when for example an expected property was not found. Such error reporting still has to be added to JsonGrammar.If you have any questions, comments, ideas or bug reports, feel to leave a comment or open a ticket on GitHub.
]]>Control.Replicate
. The source code is available on GitHub. In this post I will explain what it does and how to use it.
Module Control.Applicative
not only defines the Applicative
and Alternative
type classes, it also offers some useful combinators to express how often an action should be run: many
takes an action and runs it zero or more times, collecting the results in a list. Function some
does the same but performs the action at least once. Finally, there is optional
which performs its argument action zero or one time, returning a Maybe
value.
Module Control.Replicate
separates such replication schemes from the actual action. It, too, defines a many
, some
, and opt
, but to actually run an action x
that many times, we give the scheme and the action in question to the run operator *!
(read: times) like so: many *! x
, some *! x
, opt *! x
.
Why is this useful? Well, it turns out that these replication schemes are highly composable, in standard ways. They themselves are instances of Applicative
, Category
and Alternative
. With these combinators, we can sum them, multiply them, and indicate choice, respectively. Let’s look at some examples.
The primitive, atomic building blocks are zero
and one
. If we pass these to *!
, the action is not run at all, or run exactly once. Their types are:
zero :: b -> Replicate a b one :: Replicate a a (*!) :: Alternative f => Replicate a b -> f a -> f b
The schemes are represented by type constructor Replicate
. If we look at the type of *!
, we can see that Replicate
‘s first type parameter indicates the result type of the action, while its second type parameter indicates the type of the result after running that action so many times. In the case of one
, the two arguments are identical. In the case of zero
, the scheme will not run the action at all but it still needs to produce a b
. This is why zero
takes an argument of type b
.
Schemes are Applicative
. We can create two
and three
, the schemes that run an action exactly two and three times, using one
as building block:
two :: Replicate a (a, a) two = (,) <$> one <*> one three :: Replicate a (a, a, a) three = (,,) <$> one <*> one <*> one
Look at their result types: the tuples indicate precisely how many times the action is run. You can read <*>
as plus: 2 = 1 + 1, 3 = 1 + 1 + 1.
Of course pure
is also defined for schemes. The identity element of addition is 0, and this is exactly what pure
means. It is a synonym for the zero
we saw earlier.
Schemes also form a Category
. We can use it to multiply them. Here are two examples:
twiceThree :: Replicate a ((a, a, a), (a, a, a)) twiceThree = two . three thriceTwo :: Replicate a ((a, a), (a, a), (a, a)) thriceTwo = three . two
In both cases an action is run six times, but their results are nested differently. We will see another multiplication example in a moment.
The identity element for multiplication is 1. Scheme one
exactly matches the type of function id
in the Category
type class.
Until now the examples have only seen schemes for running an action exactly so many times. But schemes are Alternative
and can encode multiple frequencies. This is how opt
, the scheme that runs an action zero or one times, is defined:
opt :: Replicate a (Maybe a) opt = zero Nothing <|> Just <$> one
Schemes many
and some
also use choice:
many :: Replicate a [a] many = zero [] <|> some some :: Replicate a [a] some = (:) <$> one <*> many
We now have many ways to combine replication schemes, and if we use choice together with sums or products, it’s not always immediately clear what the resulting scheme means. That’s why the module also exposes a function sizes
which lists the frequencies a scheme allows:
> sizes one [1] > sizes two [2] > sizes opt [0,1] > take 10 (sizes many) [0,1,2,3,4,5,6,7,8,9] > take 10 (sizes some) [1,2,3,4,5,6,7,8,9,10]
In this sense, the schemes encode sets of Peano literals, and <|>
computes the union of two sets.
Now it’s also clear what the empty
scheme is: the scheme that doesn’t allow an action to occur with any frequency; not even zero times.
> sizes empty []
As promised, another example that uses multiplication:
even :: Replicate a [(a, a)] even = many . two > take 10 (sizes even) [0,2,4,6,8,10,12,14,16,18]
This scheme allows all even occurrences of an action, and its type reflects that exactly: there is no way to capture an odd number of a
s in [(a, a)]
.
Another combinator available in the module is between :: Int -> Int -> Replicate a [a]
, which limits the frequency of an action to a lower and upper bound:
> sizes (between 5 10) [5,6,7,8,9,10]
What frequencies does (,) <$> between 3 5 <*> two
allow? Let’s check:
> sizes ((,) <$> between 3 5 <*> two) [5,6,7]
This makes sense: if we run an action 3, 4 or 5 times and then another two times, we’ve run it 5, 6 or 7 times.
What does between 7 9 . three
mean? What about three . between 7 9
?
> sizes (between 7 9 . three) [21,24,27] > sizes (three . between 7 9) [21,22,23,24,25,26,27]
If the schemes become a bit more involved, it can be helpful to think about them as dice throws. Then between 7 9 . three
means: throw a die with 7, 8 and 9 eyes on it, and use the outcome to decide how many times to throw a die with exactly 3 eyes. This has possible outcomes [9,12,15].
In the other case, we throw die between 7 9
three times, ending up with the full range 21-27 as possible outcomes.
Currently if you try to evaluate sizes (many . opt)
, the program hangs. This is true not for just opt
but for any scheme that allows frequency zero. Is there a bug in the definitions of the combinators, or is it unreasonable to expect the library to produce output in this case?
Another problem is that sizes (id . r)
takes longer than just r
for no apparent good reason. (Try r = exactly 1000
.) Perhaps some profiling will show what the problem here is.
In a two-year-old post Luke Palmer shows an implementation of the Fibonacci sequence using the reverse state monad: a state monad where the results flow forward but the state flows backward.
Similar results can be achieved using the ReverseT
monad transformer which reverses the effects of any monad for which the monadic fixpoint mfix :: MonadFix m => (a -> m a) -> m a
is defined:
newtype ReverseT m a = ReverseT { runReverseT :: m a } instance MonadFix m => Monad (ReverseT m) where return = ReverseT . return ReverseT m >>= f = ReverseT $ do rec b <- runReverseT (f a) a <- m return b instance MonadTrans ReverseT where lift = ReverseT
With this transformer we can write Luke's computeFibs
as follows:
cumulativeSums = scanl (+) 0 computeFibs = flip evalState [] . runReverseT $ do fibs <- lift get lift $ modify cumulativeSums lift $ put (1:fibs) return fibs
Are there any other monads m
for which ReverseT m
is interesting?
Today Chris Eidhof, Sebastiaan Visser and I got our master’s diplomas, all on Haskell-related generic programming subjects. The diploma speeches, given by Andres Löh, Johan Jeuring, and José Pedro Magalhães, were very flattering. The picture above, taken by my sister Tamar, shows yours truly signing his diploma.
Next week I start working full-time at Q42 in The Hague. I hope to move a bit closer to work somewhere in the next few months. Right now I spend over 3 hours travelling each day to get to and from work; that has to change.
]]>newtype Fix f = In { out :: f (Fix f) }
Most explanations of this datatype I have read or heard start with this definition and then proceed to explain it, using various examples. In today’s post I will also introduce you to this datatype, but I want to take a different approach: I will show you a problem to which the Fix datatype is the natural solution, deriving its definition along the way.
The Haskell code in this post does not use very advanced features: there are no type functions or even type classes, only datatypes and parameters. If you are familiar with datatypes, type parameters and their syntax, it should not be hard to follow. If you have any questions, feel free to post them!
Let’s take our trusty old friend the arithmetic expression datatype:
data BareExpr = Num Int | Add BareExpr BareExpr | Sub BareExpr BareExpr | Mul BareExpr BareExpr | Div BareExpr BareExpr
I’ve called it BareExpr
here for a reason: we are going to change it in such a way that we can also store position information in it, resulting in type PosExpr
, so that when a PosExpr
is produced by a parser, we can trace back where in the original source code the tree nodes were. This is useful in various applications. For example, compilers that output error messages generally provide position information about where the error occurred exactly. It is also useful in tools that need to understand text selections in the source code, such as editors that feature refactoring.
Adding position information to a single datatype is not very difficult. After we have done so for BareExpr
, we will look at the real problem: how to do this for any datatype.
There are several ways to add position information to a datatype. In our case we will couple every occurrence of BareExpr
with a location. Let’s call the type of locations SrcSpan
and the annotated version of the expression datatype PosExpr
:
type PosExpr = (SrcSpan, PosExpr’) data PosExpr' = Num Intr’ | Add PosExpr PosExpr | Sub PosExpr PosExpr | Mul PosExpr PosExpr | Div PosExpr PosExpr
In a series of steps, we will reach our final solution.
BareExpr
and PosExpr'
are very similar: they both contain five constructors, and each constructor has the same number of fields. Can we capture this structure somehow? Yes, we can: the two types only differ in the types of their recursive positions, and in a very regular way. We can do here what we would do in any similar case: make the parts that differ arguments, and then express the original entities in terms of this new, general entity by providing specific arguments.
Haskell allows us to do that with datatypes: simply introduce a new type argument r
. We call the resulting type ExprF
:
data ExprF r = Num Int | Add r r | Sub r r | Mul r r | Div r r
The F
in ExprF
stands for functor, and such a datatype is usually called a base functor. Base functors determine the shape of the top level of a tree, but the shape of their children is determined by the type argument.
Now we need to recover BareExpr
and PosExpr
by expressing them in terms of ExprF
. For BareExpr
, we want the child positions of ExprF
also to be bare expressions. This leads to an infinite type:
BareExpr ~ ExprF (ExprF (ExprF ...))
This says that to get bare expressions back, we want to take ExprF
and have its children be ExprF
s again, and those children’s children to be ExprF
s again, and so on. In Haskell we can encode infinite types by introducing new datatypes (we reuse the name BareExpr
here):
newtype BareExpr = BareExpr { runBareExpr :: ExprF BareExpr }
If you repeatedly expand this definition, you will see that it results in the infinite type above.
For PosExpr
we can think of a similar infinite type:
PosExpr ~ (SrcSpan, ExprF (SrcSpan, ExprF ...))
Again, we write this down using a new datatype:
newtype PosExpr = PosExpr { runPosExpr :: (SrcSpan, ExprF PosExpr) }
Currently BareExpr
works only for the ExprF
shape. Let’s create such a ‘bare’ version for any shape instead of just ExprF
s. We can do this by making the base functor an argument:
newtype BareExpr f = BareExpr { runBareExpr :: f (BareExpr f) }
On the right-hand side, we have replaced ExprF
by the argument f
. In step 2 we supplied the type we were defining as argument to ExprF
; in this new version we do the same to f
, but since this new version has a type argument, we need to supply this argument in the recursive position as well.
But… this datatype is no longer specific for arithmetic expressions, so the name BareExpr
is not very appropriate. In fact, the type we have just defined is the famous Fix
disguised under a different name!
newtype Fix f = In { out :: f (Fix f) }
So now you know what Fix
does: it takes a base functor, such as ExprF
, and recursively applies it to itself, creating a tree that is of the same shape at every level.
Our new definition of BareExpr
doesn’t need to introduce any new datatypes but can now be a simple type synonym:
type BareExpr = Fix ExprF
For PosExpr
we can make two generalizations. The first is to not just store source locations, but allow any type of annotation:
newtype AnnExpr x = AnnExpr { runAnnExpr :: (x, ExprF (AnnExpr x)) }
The second is similar to the one we made to BareExpr
: have it work for any base functor instead of just ExprF
s:
newtype AnnFix x f = AnnFix { runAnnFix :: (x, f (AnnFix x f)) }
To recover PosExpr
, we give AnnFix
the two appropriate type arguments:
type PosExpr = AnnFix SrcSpan ExprF
The Fix
type captured the idea of take a functor and applying it to itself recursively. AnnFix
does something similar. Can we perhaps express AnnFix
in terms of Fix
to make this explicit?
It turns out we can, if we introduce a helper datatype Ann
:
data Ann x f a = Ann x (f a) type AnnFix x f = Fix (Ann x f)
Ann
couples an annotation x
with a functor value. It’s kind of a tuple type, lifted to a higher order on the right side.
We have seen many (intermediate) definitions of datatypes, but in the end only two of them matter:
newtype Fix f = In { out :: f (Fix f) } data Ann x f a = Ann x (f a)
And of course, we have our expression example expressed in terms of these two building blocks:
type BareExpr = Fix ExprF type PosExpr = Fix (Ann SrcSpan ExprF)
With just these two building blocks, we can express generically annotated trees and unannotated trees. What is the point of generalizing this far? Well, by making these types not specific to a particular tree shape (such as ExprF
), you can build all sorts of tools that work on many kinds of trees. In my Masters thesis I explore this concept further, developing parser combinators that automatically insert the position information for you at the appropriate places, catamorphisms that automatically couple errors with position information, conversions between text selections and tree selections and a couple of other things.
If you’re interested in datatype fixpoints and would like to know more, here is a collection of interesting tutorials, applications and papers:
Right now we’re all in the Google HQ, enjoying hacking in a spacious room with a fridge full of drinks. Thanks, Google!
Create a script that looks like this:
#!/usr/bin/env bash GREEN=`echo -e '33[92m'` RED=`echo -e '33[91m'` RESET=`echo -e '33[0m'` /usr/bin/ghci "${@}" | sed "s/^Failed, modules loaded:/${RED}&${RESET}/g;s/^Ok, modules loaded:/${GREEN}&${RESET}/g"
If your ghci
is not located in /usr/bin
, change the path in the script accordingly. If you want, you can name your script ghci
so that it takes over the original one. Just make sure its location appears in your PATH
variable before the location of the true ghci
.
If all goes well, you should now see colors whenever you load your modules:
% ghci Sirenial.Merge
...
Failed, modules loaded: Sirenial.Query.
And then when the bug has been fixed:
% ghci Sirenial.Merge
...
Ok, modules loaded: Sirenial.Merge, Sirenial.Query.
This has only been tested on Terminal.app in Snow Leopard. If it doesn’t work for your system, please leave a comment, with—if possible—a fix.
There is a small issue: sometimes sed
delays the colored parts a bit, causing your prompt to be printed before the success or error message. Again, if you know a fix, please comment.
With the whole of the Netherlands covered in snow, the traffic is completely jammed. But that didn’t bother my flatmates and me, because we stayed home all day. In the afternoon we rolled the biggest snowball in the street; at the end it reached a meter in diameter and it took three of us to push it around. We guessed it weighed approximately 250 kg at that point.
While dinner was cooking I went outside again to take some pictures.