Introducing JsonGrammar

Posted on Sunday, 08 May 2011 at 22:11.

The first version of JsonGrammar has just been released on Hackage! JsonGrammar offers an API for converting between your own datatypes and JSON ASTs.

“What, another JSON library? Don’t we have enough already?”

It’s true that there are already a few JSON libraries out there. These libraries, however, require you to write fromJson and toJson separately.

“Uhm, yes… is that bad?”

Yes. It violates the DRY principle. If I show you an implementation of fromJson for a certain type, you can write a corresponding toJson without requiring any further information. Similarly, if I show you an implementation of toJson, you can write the accompanying fromJson. Writing down the same thing twice is tedious and opens up the possibility to make mistakes.

“But most of these libraries offer Template Haskell support that does this work for you!”

This is true, but they also make all the choices for you about how your datatypes should map to JSON. Usually they assume the names of your record fields map directly to JSON property names. The shapes of your family of datatypes need to correspond to how the objects in JSON are nested. These libraries give you the choice: either you write out fromJson and toJson by hand and have full control over the mapping, or you give up this control and let Template Haskell do all the work for you.

JsonGrammar gives you the best of both worlds: it gives you full control over what the mapping should be, with an API that lets you define fromJson and toJson at the same time. It achieves this by separating the constructing/destructing of datatype constructors and its fields from the description of the JSON values. The former is derived by Template Haskell, the latter is provided by the programmer.

An example

Suppose we have these two datatypes describing people and their current location:

data Person = Person
  { name   :: String
  , gender :: Gender
  , age    :: Int
  , lat    :: Float
  , lng    :: Float
  }

data Gender = Male | Female

Sadly, the JSON source we are communicating with is using JSON with Dutch property names and values, so we cannot use Template Haskell to derive the JSON mapping for us, like we would do with other JSON libraries. Neither do we want to use Dutch names for our record selectors; nobody would be able to understand our code anymore! Fortunately this isn’t a problem with JsonGrammar.

The first step is to have Template Haskell derive the constructor-destructor pairs:

person         = $(deriveIsos ''Person)
(male, female) = $(deriveIsos ''Gender)

For the latter to work, you need to enable -XNoMonoPatBinds.

Then we write instances of the Json type class to define the mapping from/to Json. The order in which the properties are listed matches that of the fields in the datatype:

instance Json Person where
  grammar = person . object
    ( prop "naam"
    . prop "geslacht"
    . prop "leeftijd"
    . prop "lat"
    . prop "lng"
    )

instance Json Gender where
  grammar =  male   . litJson "man"
          <> female . litJson "vrouw"

The . operator is from Control.Category. The <> is just another name for mappend from Data.Monoid and denotes choice.

That’s all! We have just defined both fromJson and toJson in one simple definition. Here’s how you can use these grammars:

ghci> let anna = Person "Anna" Female 36 53.0163038 5.1993053

ghci> let Just annaJson = toJson anna

ghci> annaJson
Object (fromList [("geslacht",String "vrouw"),("lat",Number
53.01630401611328),("leeftijd",Number 36),("lng",Number
5.199305534362793),("naam",String "Anna")])

ghci> fromJson annaJson :: Maybe Person
Just (Person {name = "Anna", gender = Female, age = 36, lat = 53.016304,
lng = 5.1993055})

Show me the types!

The library is based on partial isomorphisms:

data Iso a b = Iso (a -> Maybe b) (b -> Maybe a)

instance Category Iso
instance Monoid (Iso a b)

A value of type Iso a b gives you a function that converts an a into a Maybe b, and a function that converts a b into a Maybe a. This composes beautifully as a Category. The Monoid instance denotes choice: first try the left-hand conversion function, and if it fails, try the right-hand side.

A JSON grammar for some type a is nothing more than a value of type Iso Value a, where Value is the type of a JSON AST from the aeson package. That is, it’s a pair of conversion functions between JSON trees and your own datatype. Building JSON grammars like the one above is about composing isomorphisms that translate between intermediate types.

The isomorphisms person, male and female translate between constructors and their individual fields. For example:

person :: Iso (String, Gender, Int, Float, Float) Person

Converting from a constructor to its fields might fail, because the value that is passed to the conversion function might be a different constructor of the same datatype. This is why the Monoid instance is so useful: we can give multiple grammars, usually one for each constructor, and they will be tried in sequence. They are effectively composable pattern matches.

Stack isomorphisms

There is a problem with encoding the fields of such a constructor as an n-tuple: if we want to compose it with other isomorphisms that handle the individual fields, we have to use complicated tuple projections to select the fields that we’re interested in. Basically we have unwrapped the fields from one constructor only to wrap them in another one!

The solution is to use heterogenous stacks of values. They are reminiscent of continuation-passing style, because in the way we use them they usually have a polymorphic tail:

person :: Iso (String :- Gender :- Int :- Float :- Float :- t) (Person :- t)

Read :- as ‘cons’, but then for types instead of values. Its definition is simple:

data h :- t = h :- t

The polymorphic tail says that person doesn’t care what’s on the stack below the two Floats; it will simply pass that part of the stack on to the right-hand side. And vice versa, if we’re working with the isomorphism in the opposite direction.

Have you thought about what the types of male and female would be in the non-stack versions of the isomorphisms? They don’t have any fields; we would have to leave the first type parameter of Iso empty somehow, for example by choosing (). Stack isomorphisms have no such problem; we simply make the first type argument the polymorphic tail on its own, without any values on top:

male   :: Iso t (Gender :- t)
female :: Iso t (Gender :- t)

Stack isomorphisms compose beautifully using ., often without needing any special projection functions. To get a feeling for it, try compiling the example Json grammars and looking at the types of the individual components.

I lied when I wrote that grammars have type Iso Value a; they actually use stacks themselves, too. Here is the true definition of the Json type class:

class Json a where
  grammar :: Iso (Value :- t) (a :- t)

Different tree shapes

Let’s take our Person example and make a small modification. We decide that because (lat, lng)-pairs are so common together, we’d like to put them together in their own datatype:

data Coords = Coords { lat :: Float, lng :: Float }
  deriving (Eq, Show)

data Person = Person
  { name     :: String
  , gender   :: Gender
  , age      :: Int
  , location :: Coords
  } deriving (Eq, Show)

However, in this example we have no control over the JSON format and cannot change it to match our new structure. With JsonGrammar we can express mappings where the nesting is not one-to-one:

instance Json Person where
  grammar = person . object
    ( prop "naam"
    . prop "geslacht"
    . prop "leeftijd"
    . coordsProps
    )

coordsProps :: Iso (Object :- t) (Object :- Coords :- t)
coordsProps = duck coords . prop "lat" . prop "lng"

Here duck coords wraps (or unwraps, depending on the direction) the two matched Float properties in their own Coords constructor before continuing matching the other properties in an object. Function duck is a combinator that makes a grammar (coords in this case) work one element down the stack. Here it makes sure the top values can remain Objects, which is needed by prop to build/destruct JSON objects one property at a time.

What is important to note here is that not only can we express mappings with different nestings, we can also capture this behaviour in its own grammar for reuse. JsonGrammar allows this level of modularity in everything it does.

History and related work

The ideas behind JsonGrammar go back a bit. They are based on Zwaluw, a library that Sjoerd Visscher and I worked on. The library aids in writing bidirectional parsers/pretty-printers for type-safe URLs, also in a DRY manner. Zwaluw, too, uses stacks to achieve a high level of modularity. In turn, Zwaluw was inspired by HoleyMonoid, which shows that the CPS-like manner of using polymorphic stack tails allows combinators to build up a list of expected arguments for use in printf-like functionality.

The Iso datatype comes from partial-isomorphisms and is described in more detail in Invertible syntax descriptions: Unifying parsing and pretty printing by Tillmann Rendel and Klaus Ostermann. They also use stacks (in the form of nested binary tuples), but they are not using the trick with the polymorphic tail (yet?).

Future work

Although JsonGrammar is usable, there is still work to be done:

If you have any questions, comments, ideas or bug reports, feel to leave a comment or open a ticket on GitHub.

Comments

By Tillmann Rendel on Monday, 09 May 2011 at 10:25:

Cool stuff, especially the polymorphic tail. We haven’t thought of that. Any particular reason you are not depending on partial-isomorphisms?

BTW, about the following: “Have you thought about what the types of male and female would be in the non-stack versions of the isomorphisms? They don’t have any fields; we would have to leave the first type parameter of Iso empty somehow, for example by choosing ()”.

That is indeed what I (and my TH code in partial-isomorphisms) would do, but I wouldn’t see it as a _semantic_ problem. In my view, an (Iso a b) represents an isomorphisms between subsets of the extensions of a and b. In the case of male, we have a = () and b = Gender. The extension of (), that is, the set of all values of type (), is {(), undefined}. The extension of Gender is {Male, Female, undefined}. Now, the partial isomorphism male of type (Iso () Gender) relates () to Male and undefined to undefined. No special rules are necessary to handle singleton constructors.

The use of stacks for non-singleton constructors is a problem, though, because it doesn’t work our for bottoms: (a, b, c) and (a, (b, c)) are not isomorphic in Haskell. I am not sure whether or how that problem shows up with stack isomorphisms.

By Martijn on Monday, 09 May 2011 at 11:45:

Hi Tillman,

Thanks for your comment!

I’m currently not depending on partial-isomorphisms mainly because there was no Monoid instance (or Semigroup instance; I really like using <>) and I didn’t want to write an orphan instance. Also I expect I will generalise from Maybe to arbitrary monads some time in the future, to support better error messages.

As for undefined, I haven’t worried about that at all. I understand it’s important if you delve into the mathematical semantics of partial isomorphisms, but I don’t think users of JsonGrammar will run into any problems with undefined since they’re mostly worried about converting to and from JSON. I hope that I’m not abusing the term ‘partial isomorphism’ too much in this sense.

By Chris Done on Monday, 09 May 2011 at 13:24:

Nice library, I will find this useful. :-) (I have 40~ data types that I’m encoding/decoding to JSON in a project.)

How does this relate to the lenses/bidirectional approach done in fclabels[1], if at all? Just curious. Will comment more when I have time.

[1]: http://hackage.haskell.org/package/fclabels

By Martijn on Monday, 09 May 2011 at 14:17:

Good question, Chris. To be honest I haven’t given fclabels much thought (although I really should). Quick answer: I don’t think fclabels supports failure.

By solrize on Monday, 09 May 2011 at 17:33:

I thought there was a json package from Galois that used SYB to generate the conversion functions. I actually remember using it but have forgotten what package it was or exactly what it did.

By Martijn on Monday, 09 May 2011 at 20:19:

@solrize Yup, it’s called json.

By Chris Done on Tuesday, 10 May 2011 at 07:56:

solrize: Text.JSON.Generic generates based on Data and Typeable, but you don’t get any control over the mapping (which you really need most of the time).

By Yitz on Tuesday, 10 May 2011 at 14:20:

Very nice ideas, Martijn!

> The <> is just another name for mappend from Data.Monoid

Can you make it be from Edward Kmett’s semigroups package instead? Otherwise, your module will be less convenient to use with any program that uses a datatype which is a semigroup but not a monoid. Now that I noticed the semigroups package, I am finding more and more of those…

By Martijn on Tuesday, 10 May 2011 at 22:10:

I actually have already, Yitz! :-) I hope his package gains some more popularity. Thank you for your comment!

By Tillmann Rendel on Friday, 13 May 2011 at 00:50:

I looked at your Semigroup instance, and I am not sure I like it. Consider the following two partial isomorphisms:

iso1, iso2 :: Iso Char ()
iso1 = unstack (unit . inverse (lit ‘x’))
iso2 = unstack (unit . inverse (lit ‘y’))

In isolation, these partial isomorphisms are fine. But (iso1 <> iso2) has a problem: It is not really inverse to (inverse (iso1 <> iso2)).

> convert (iso1 <> iso2) ‘y’
Just ()

> convert (inverse (iso1 <> iso2)) ()
Just ‘x’.

Oups. We successfully converted from Char to () and back to Char, but we arrived at a different character. I am not sure how serios that problem is in your setting.

From a theoretical point of view, I believe this problem is fundamental with partial isomorphisms: They do not really support choice. That is why in our paper, we used partial isomorphisms only for semantic actions, and built the additional layer of syntax descriptions on top of it.

By Sjoerd Visscher on Friday, 13 May 2011 at 11:18:

There’s a kind of underlying assumption that <> is used on disjoint pattern matches. It would be nice to have a way to check that this is always the case, and also that there are no cases missing when pattern matching.

By Martijn on Friday, 13 May 2011 at 11:37:

Yes, this might be a problem. In this sense ‘inverse’ is a poor name. I wonder how hard Sjoerd’s suggestion would be to implement.

For JSON grammars it might actually be a feature: you can map two different JSON constructs to the same Haskell value, and on the way back consistenly map the Haskell value to the preferred of the two JSON constructs.

Btw, I fixed the missing <>s in your comment.

By g on Monday, 11 June 2012 at 03:23:

Hi, Quoridor doesn’t work on Lion.

Thx

Leave a comment!

Martijn loves to receive comments! Add yours by filling out the fields below.