Squeezing More Out of the F# Type System – Introducing Crates

Squeezing More Out of the F# Type System – Introducing Crates

At G-Research, we make extensive use of F#. It’s a great fit for us because we get the benefits of a typed, functional-first programming language, alongside good integration with the rest of .NET via the CLR, which is crucial because we also have a lot of key tech written in C# and C++.

Some of the most pivotal moments in my work have come when I have stepped back and seen a principled and elegant solution to a hard problem that our team has beautifully engineered. Some of my most prized F# tools in G-Research have come from us looking at other languages – Haskell, Rust and Idris come to mind for me – and feeling a bit of envy over their impressive features. Sure, we have a handy syntax for monads, neat ways to deconstruct arbitrary data and access to a huge array of .NET libraries, but wouldn’t it be nice if we could just reach for a slightly more advanced abstraction in order to clearly capture the essence of the problem at hand, and hence write a much cleaner solution?

One example is existentially quantified types. F# doesn’t natively support existentials, but it turns out that we can do a pretty good job of simulating them. In G-Research, we call the pattern for doing this crates* and actually make extensive use of them throughout our codebase.

N.B. The reason I call crates a pattern in that there’s no way in F# to create a single description of crates in the most general sense, so instead we stamp out a little bit of boilerplate particular to the structure that the crate exposes, so I mean “pattern” in much the same way that Martin Fowler or the Gang of Four do.

* Not to be confused with the meaning of crates used in the Rust world.

A World without Crates

Crates help us manage a range of common painful situations that arise when we try to build powerful APIs. In my experience, anywhere you see an obj, you’re probably able to replace it with a crate and see a corresponding increase in readability and type-safety. Hopefully at least one of the examples below will resonate with you as a problem that you have faced which you wish could have been captured more elegantly.

The Combinator Pattern

A common technique is to capture an abstract description of an operation or data structure in terms of some basic building primitives and ways of combining them (often referred to as the combinator pattern). If we hold this description of primitives and combinators as described, and don’t immediately do anything with them, we can later take the structure and interpret it in a variety of different ways.

A small example might be a toy command line argument parser that knows how to parse individual primitive values, as well as how to compose them into records, tuples or whatever (with no limit on deep the composition can go). Our intention is to capture the structure of the parser in terms of these two things, such that we can walk over the same abstract description to parse some command line arguments and print some help for the parser to the console.

N.B. In reality, an argument parser such as this would probably be an alternative applicative or similar, but I’ll avoid that for now so as to keep the emphasis here on the existentials.

We could imagine a way to define an argument parser that might look a little bit like this (with the appropriate boxing and unboxing where necessary):

type HeightAndWeightArgs = { Height : int ; Weight : float }
let parseHeight str = int str
let parseWeight str = float str
let parser =
    Compound                                         // This arg parser for HeightAndWeightArgs...
        (
            Primitive parseHeight,                   // uses `parseHeight` to parse an integral height from a string...
            Primitive parseWeight,                   // and `parseWeight` to parse the floating-point height from a string...
            (fun h w -> { Height = h ; Weight = w }) // and combines their result by placing them directly into the
                                                     // HeightAndWeightArgs record constructor
        )

Our naive first cut for the ArgParser type might look something like this:

type ArgParser =
    | Primitive of read:(string -> obj)
    | Compound of left:obj * right:obj * compose:(obj -> obj -> obj)

Or we could try this slightly better alternative definition for ArgParser:

type ArgParser<'a> =
    | Primitive of read:(string -> 'a)
    | Compound of left:obj ArgParser * right:obj ArgParser * compose:(obj -> obj -> 'a)

These would compile, and would allow us to describe the structure of our parser in terms of primitives and ways to compose them, but they would be painful to use because there’s no static guarantee that the obj ArgParser values given to the Compound constructor agree with the argument types expected to the compose function given to the same constructor. We’d be reduced to defensively checking types at runtime and casting all over the place when we were trying to run the parser. This would surely become a nightmare to maintain very quickly indeed.

We might try to sketch out a more strongly typed alternative, but find that the type parameters always being hoisted to the top of the ArgParser type makes things tricky and means that the below does not compile:

type ArgParser<'a> =
    | Primitive of (string -> 'a)
    | Compound of 'left ArgParser * 'right ArgParser * ('left -> 'right -> 'a) // 'left and 'right are not brought into scope on the first line so this is invalid F#

Similarly, putting all three type parameters on the type isn’t right either:

type ArgParser<'a,'left,'right> =
    | Primitive of (string -> 'a)
    | Compound of ArgParser<'left,_,_> * ArgParser<'right,_,_> * ('left -> 'right -> 'a) // What goes in place of ther underscores here?
                                                                                         // By not hiding these type parameters we are
                                                                                         // forced to allow them to grow as we compose,                                                                                          // which means making a series of ArgParser-like
                                                                                         // things of increasing type argument arity.

What we really want to do is hide'left and 'right in the ArgParser<'a> type, preserving only the knowledge that compose accepts them. Some hypothetical F# extension could make this available like so:

type ArgParser<'a> = // 'left and 'right don't need to appear here since they don't escape the Compound constructor
    | Primitive of (string -> 'a)
    | Compound<'left,'right> of 'left ArgParser * 'right ArgParser * ('left -> 'right -> 'a)

Crates allow us to do just this, albeit with a bit more syntactic noise.

Reflection

If you’ve ever wrangled with .NET’s reflection API, you’ll likely have found yourself in “obj hell” very quickly, where any notion of type-safety goes out the window and you’re left hoping that you’ve done your unboxing correctly – not something you want, especially if you’re using F# precisely for its safety!

To help make my point clear, imagine we had some code which reflectively deserialised a file and created a value of the appropriate type in F#. Suppose our contract was that the file would contain a list of values, but we wouldn’t know what those values would be statically, although perhaps some metadata would make that information available during deserialisation.

A common pattern I have seen is to the use type argument to a method or function as a way to assert “the serialised value is of the given type, so return of those to me (else throw an exception)”:

// Looks clean at a glance, *but* it assumed you've passed in the right 'a!
val deserialiseList<'a> : file:System.IO.FileInfo -> 'a list

Of course, this is no good if you don’t know what 'a should be at the call site of deserialiseList. Assuming we don’t necessarily know the type of the list elements, we’ll need a different API.

A naive approach to constructing the API might be to write a function which returns an object, leaving the onus on the consumer to treat it as a list of values of some unknown type, or to do some unboxing magic if they eventually find out what the type should be:

// Returns a list of values - I promise!
val deserialiseList : file:System.IO.FileInfo -> obj

Hmm, the opaque obj there makes it pretty easy to forget what’s supposed to be inside it and instead accidentally treat it as if it’s not even a list! A slightly better approach might be:

// The elements of the list are all of the same type - I promise!
val deserialiseList : file:System.IO.FileInfo -> obj list

Better, but it’s still possible to get a confused and treat this in the wrong way. For example, I could box an int and cons it on the front of the list, but that would only be valid if the list truely was a list of boxed ints at runtime. Similarly, the deserialiseList function could be wrong and return a list of values of all sorts of different types which had been boxed and I wouldn’t even know it until it was too late!

What we want is a way to say that there’s a list of values, all of the same type, and provide a way to safely act on them. Crates allow us to do this too.

The Road to Crates

Below, I will explore various simpler patterns, concepts and language features, eventually building up to simulating existential types in F#. I’ll mention a load of technical terms for this stuff, mainly because I hope it will make any confusion you have much more Googleable. There are plenty of supplementary sources that can help you make sense of the bits I leave unsaid, but none of it is a substitute for actually trying to solve a problem with existentials – that’s how you’ll build a strong intuition and ultimately grok the topic.

I hope that you’ll be able to bounce between this post, the linked sources and the F# REPL, exploring the ideas discussed here until you’re left feeling like you could recognise when a situation calls for a crate, at which point you could check back here for the precise details of the trick.

Generics

Generics are a mechanism for avoiding boilerplate whilst maintaining type-safety guarantees in our types and algorithms. Generics allow types to be given as parameters to other types, as well as methods and top-level functions on modules. For example, here is an idiomatic use of generics in F# at the top level of a module:

module Primitives =

    /// No need to write an implementation for every type,
    /// instead the caller passes the type 'a as a parameter.
    /// e.g.
    ///   id<int> : int -> int
    ///   id<string> : string -> string
    val id<'a> : 'a -> 'a

The type parameter is (either implicitly or explicitly) chosen and passed in by the caller, at which point the logical “stamping out” of a version of the function for the given type occurs.

Universal Quantification

For all types x, such that…

A gotcha with the definition used as an example of generics above is that as soon as Primitives.id is passed as an argument to a function, this notion of it being customisable to any 'a the caller requests goes away, and we are restricted to exactly one application of type parameters!*

Try as I might, all of my attempts to call customId on two values of potentially different types either don’t compile, or result in the domain and range of customId being inferred to be the same as the types of x and y.

This doesn’t work:

let f (customId : 'a -> 'a) x y =
    printf "%A" (customId x)
    printf "%A" (customId y)
val f : customId:('a -> 'a) -> x:'a -> y:'a -> unit

Nor does this:

let g (customId : 'a -> 'a) (x : 'b) (y : 'c) =
    printf "%A" (customId x)
    printf "%A" (customId y)
      printf "%A" (customId x)
  --------------------------^

stdin(7,27): warning FS0064: This construct causes code to be less generic than indicated by the type annotations. The type variable 'a has been constrained to be type ''b'.

      printf "%A" (customId y);;
  --------------------------^

stdin(8,27): warning FS0064: This construct causes code to be less generic than indicated by the type annotations. The type variable 'b has been constrained to be type ''c'.

val g : customId:('c -> 'c) -> x:'c -> y:'c -> unit

No luck here, either:

let h customId x y =
    printf "%A" (customId x)
    printf "%A" (customId y)
val h : customId:('a -> 'b) -> x:'a -> y:'a -> unit

The issue here is that we have reached a fundamental limitation of what is directly expressible in F#. It follows that the trick to simulating universal quantification is, therefore, to avoid ever passing the function around directly, instead hiding the type parameter away such that it cannot be fixed to one particular value by the caller, but how might one do that?

Recall that F# provides access to the .NET object system. What if we made our own class (in the object-oriented sense) and put a generic method on that? We could create instances of that which we could pass around, and hence carry our function with it (in the form of said method)?

// Encoding the function signature...
// val id<'a> : 'a -> 'a
// ...in terms of an interface with a single generic method
type UniversalId = abstract member Eval<'a> : 'a -> 'a

Now we can create an implementation which we can pass around without the type parameter being fixed:

// Here's the boilerplate I warned you about.
// We're implementing the "UniversalId" interface
// by providing the only reasonable implementation.
// Note how 'a isn't visible in the type of id -
// now it can't be locked down against our will!
let id : UniversalId =
    { new UniversalId with
        member __.Eval<'a> (x : 'a) : 'a = x
    }

Now we have a way to simulate a universally quantified function. We can pass id around as a value, and at any point we pick a type 'a to pass to it just as we would any value-level argument.

N.B. This is why the val deserialiseList<'a> : file:System.IO.FileInfo -> 'a list earlier was so wrong: The types indicate that it should work for any 'a, but the idiom says that there is some particular 'a that must be given, else the function will throw an exception.

* The polymorphism can be preserved if you stay within local scope due to the differences between let-bound and lambda-bound polymorphism. These restrictions might initially seem weird, but they are important for making type inference possible.

Continuation-Passing Style

Continuation-passing style (CPS) can be thought of, at least in the way that we will be using them, as an indirect way of representing values.

Take a simple example of an int. We can increment an int, add it to another int or any other of a myriad operations. What really makes the int type interesting is the set of functions which can be applied to values of that type.

We can allow same operations available on an int value by not providing the int directly, but instead taking any function a consumer might want and applying it on their behalf, like this: type IntCPS = (int -> 'ret) -> 'ret.

In other words, this alternative respresentation of an int is interchangable with the original:

int = (int -> 'ret) -> 'ret

Obviously, this is isn’t just true for int, but applies for any type 'a you might care to use:

'a = ('a -> 'ret) -> 'ret

You might find yourself asking why we’d want to use ('a -> 'ret) -> 'ret all over the place, paying the cost of the boilerplate and cognitive overhead, when we could just use 'a. Well, sometimes we find ourselves in a situation where the 'a we want to use isn’t directly expressible…

Existential Quantification

There exists a type x, such that…

An existential is a value whose type is unknown statically, either because we have intentionally hidden something that was known, or because the type really is chosen at runtime, e.g. due to reflection. At runtime we can, however, inspect the existential to find the type and value inside.

If we don’t know the concrete type inside an existentially quantified type, how can we safely perform operations on it? Well, we can apply any function which itself can handle a value of any type – i.e. we can apply a universally quantified function!

In other words, existentials can be described in terms of the universals which can be used to operate upon them.

Theory

We can capture the intuition above a bit more formally, as follows:

exists 'a . F<'a> = forall 'ret . (forall 'a . F<'a> -> 'ret) -> 'ret

where F<'a> is an ('a list), a (unit -> 'a), or whatever structure is to be preserved statically by our existential

Therefore, as an example, you can model an 'a list where the concrete type of 'a is unknown statically as two nested, universally quantified functions:

exists 'a . 'a list = forall 'ret . (forall 'a . 'a list -> 'ret) -> 'ret

You could read this as:

I am an existential 'a, so for any value of type 'ret that you may choose to compute from me, give me a function that can accept any 'a that might be hidden away to yield a 'ret, and I’ll apply it in the appropriate way to get the expected 'ret result.

N.B. Statically, we don’t know what 'a is, but we do know every element of the list is of type 'a, i.e. they are all of the same type. This is in contrast to, say, an obj list approach which I moaned about earlier, where the expectation is that we can downcast to some more specific type at runtime, and there’s nothing statically indicating that the cast is necessarily the same for all elements – we just take that on faith.

Finally, here’s the exists 'a . 'a list example labelled up with all of the components that we have discussed, for easy reference:

                                    Consumers of the existential must provide a
 We want to model a list of some    universally quantified function to be sure
 values of an unknown type 'a.      that it will be valid to apply to  whatever
               |                    the hidden type transpires to be.
               |                         |
               |                         |
            vvvvvvv                  vvvvvvvvv
exists 'a . 'a list = forall 'ret . (forall 'a . 'a list -> 'ret) -> 'ret
                      ^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                           |                         |
                           |        We use continuation-passing style to
                           |        avoid ever exposing the 'a directly, but
                           |        still offering a way to operate on it.
                           |
           Consumers pick their own return type
           using a universally quantified function.

This doubly-universal construct is the one which we’ll try to capture in F# in order to simulate existentials.

Practice

As we have established, universals imply a bit of interface boilerplate. Moreover, an existential implies two nested universals, so crates have double the boilerplate! Don’t worry, though, once you’re comfortable with the boilerplate it becomes trivial to read.

The most boring crate is one which hides a type whilst revealing no additional information about its structure:

/// An ObjectCrate is something that, given an appropriately quantified
/// function, can produce a 'ret value.
type ObjectCrate =
    abstract member Apply : ObjectCrateEvaluator<'ret> -> 'ret
/// For this particular crate, we need to accept a value of any type at all.
/// We can't just return 'a, since then it'd appear in our type signature and
/// it'd no longer be hidden!
and ObjectCrateEvaluator<'ret> =
    abstract member Eval<'a> : 'a -> 'ret

Notice how the methodEval takes type parameter 'a and the interfaceObjectCrateEvaluator takes the type parameter 'ret. This is how we bring the two type parameters from different sources into scope at the same time to build our existential. An arrangement of two nested universals in this way is known as rank-2 polymorphism.

So an ObjectCrate is sort of interesting, but without some constraints on the type parameter, there’s not much we can do with an existential*. If we constrained the type to be an 'a list instead, there would be more that we could do without needing to know 'a statically:

/// A list crate is just a thing which you can "visit" with
/// an appopriately quantified function encoded with our fancy boilerplate...
type ListCrate =
    abstract member Apply : ListCrateEvaluator<'ret> -> 'ret
/// ...in this case, the appropriately quantified function is one which can
/// accept a list of any type parameter.
and ListCrateEvaluator<'ret> =
    abstract member Eval<'a> : 'a list -> 'ret

/// Since we know more about, and hence can do more with, the value inside a
/// `ListCrate` it's probably worth creating a module of
module ListCrateOps =

    /// Just a handy way to make a ListCrate without writing the boilerplate
    /// yourself
    let make (l : 'a list) : ListCrate =
        { new ListCrate with
            member __.Apply e = e.Eval l
        }

    // Apart from make, we define some universally-quantified list functions,
    // i.e. ones that don't depend on the particular type parameter of the list

    // I've made all the types explicit here to help you read the definition
    let length (listCrate : ListCrate) : int =
        listCrate.Apply { new ListCrateEvaluator<int> with
            member __.Eval<'a> (l : 'a list) : int =
                List.length<'a> l
        }

    // Once you're comfortable with crates, you can let type inference take
    // over and only give the bare essentials, as I do here
    let rev (listCrate : ListCrate)  =
        listCrate.Apply { new ListCrateEvaluator<_> with
            member __.Eval l =
                make <| List.rev l
        }

    // etc

We can do a similar thing for 'a array, 'a option, unit -> 'a, 'a * int, or whatever else you might want, including multiple type parameters at once. Just remember that you need to write very similar, but slightly different crate code for all of them. If this makes you sad, there is an ongoing discussion about improving the situation in fslang-suggestions.

Yet another way to look at the above is that it is sort of like the visitor pattern taken to the extreme. The visitor has a single operation which is powerful enough that it can be applied to any of the types that might be hidden away (this is the universal quantification doing its thing). Seen through this lens, crates might make more sense to the reader with a more object-oriented background.

* All types offer the .ToString() function, so we can at least make use of that.

Revisiting the Motivating Examples

Going back to our original examples of things that crates can help with, we can now translate what we wanted for 'a ArgParser:

// Hypothetical F# syntax!
type ArgParser<'a> =
    | Primitive of (string -> 'a)
    | Compound<'left, 'right> of 'left ArgParser * 'right ArgParser * ('left -> 'right -> 'a)

…into something we can really use:

// Real F# syntax!
type ArgParser<'a> =
    | Primitive of (string -> 'a)
    | Compound of CompoundArgParserCrate

and CompoundArgParserCrate =
    abstract member Apply : CompoundArgParserCrateEvaluator<'ret> -> 'ret
and CompoundArgParserCrateEvaluator<'ret> =
    abstract member Eval : 'left ArgParser -> 'right ArgParser -> ('left -> 'right -> 'a) -> 'ret

Similarly for our reflection API, deserialiseList can now look like this:

// ListCrate makes the precise set of valid operations clear
// and wraps up the generic context for later
val deserialiseList : file:System.IO.FileInfo -> ListCrate // This is the ListCrate type defined earlier

One particularly interesting way to think about crates is that they capture some runtime “generic context”, which can be passed to any generic functions used inside the crate’s evaluator.

Here, the compiler now conveniently allows us to do things which don’t depend on the particular 'a of the list because the “generic context” can be unpacked and used by generic methods and functions:

// Inside the evaluator, we unwrap the generic context and it
// looks as if the reflection never happened at all!
let length =
    { new ListCrateEvaluator<_> with
        member __.Eval<'a> (l : 'a list) = // Unpacking the evaluator brings 'a into scope
            List.length l // Type-inference working as normal for generic functions, 'a is inferred as as it would be outside a crate
    }

This is a nice example of why the crates approach is, beyond allowing us to think functionally, often preferable to something strongly object-oriented such as creating a straight-forward IExistentialList interface which just exposes those operations common to all closed list types: We don’t lose the power to invoke generic functions where we know it is valid to call them. It is also a way of looking at why we can consume types chosen dynamically by reflection, but quickly get to a place where we can work with them safely.

Importantly, the compiler still rightly stops us from doing things which do depend on the particular 'a of the list:

// DOESN'T COMPILE - this is a good thing because by creating a ListCrate,
// the deserialiseList function was declaring that the list could be of any type,
// so prepending a 5 isn't necessarily safe!
let prependFive =
    { new ListCrateEvaluator<_> with
        member __.Eval<'a> (l : 'a list) =
            5 :: l
    }

Tips, Caveats and Gotchas

Beyond the boilerplate, there are a few additional things to be aware of when trying to use crates in earnest, so I’ll try to capture the main points here to get you going:

  • Returning unit from a crate evaluator doesn’t work (presumably due to some internal mapping to the equivalent of C#’s void). Instead you need to return something else which you can immediately ignore, usually a custom type that is equivalent to the built-in unit, modulo this special casing in the compiler:
type Unit = Unit // Working around issues with the built-in unit type

let printLength listCrate =
    listCrate.Apply
        { new ListCrateEvaluator<_> with
            member __.Eval<'a> (l : 'a list) =
                printf "Length = %i" (List.length l)
                Unit
        } |> ignore
  • The compiler errors that you can get if you get the types wrong inside a crate can be pretty unhelpful. If you see a confusing error, add any explicit types which you have so far omitted and then the errors should be a bit more comprehensible.
  • If you find yourself nesting crates inside crates, for example when using reflection to walk over a recursive data structure, you’ll always have to keep the explicit type parameters on your function calls. This has been proven to be required in general, and is not just a limitation of F#!
  • As mentioned previously, we’ve naturally converged on the name “crates” for this pattern, which annoyingly collides with the meaning given in the the Rust world. This is usually not a problem, but can cause slight confusion when on-boarding new developers. Maybe we should pick another name? Or perhaps we should accept that any name will probably end up colliding with something? I am open to suggestions.

Can we go Further?

We can use crates in more complex scenarios, for example, reflectively generating nested discriminated unions of crates which represent F# data structures, and then walking over them in a type-safe way, similar to what has been suggested elsewhere.

We also have quite a few other tricks for squeezing more out of the F# type-system, such as a way of simulating generalised algebraic data types, which can be used to create typed domain specific languages. I hope that one of us will blog about some of these other patterns in the future, so keep an eye out if you want to go deeper! 🙂

Tom Davies,  Software Engineer

Read more of our blog posts here.

Want to work with us?

View vacancies

Want to be part of a leading quantitative research and technology company? Bring your skills and experience to G-Research by applying for one of our many roles.

View vacancies