In the first part of this post we talked about the general tooling we use to develop and deploy our OCaml applications. Given the overall feedback we have gotten so far from the community it seems this approach is generally considered the way to go about OCaml in 2018.

This post will be discuss the general structure how we actually write code. No worries, it is not a style guide, we’re not going to open a tabs-vs-spaces topic. This is a slightly higher-level discussion of Do’s and Don’ts in OCaml.

The language

Being first released over 20 years ago, OCaml is not a new language. Due to the fact that compatibility with older code is prioritized there is a lot of ways code can be written for historical reasons but practice has shown that some ways are better than others. In this section we’ll outline how we try to write code to keep readability and maintainability high.

Avoid exceptions

While OCaml does support exceptions we try to avoid them when writing code. The reason is that while exceptions are easy to throw, they make reasoning about code difficult, since the type system does not track which code can throw which exceptions. This gives rise to writing code that ignores exceptions (it compiles fine, right?) and then breaks at runtime because of some `Not_found exception that is being thrown in a place deep within the code.

Result types

If not using exceptions, what to use to signal potential failures? The most obvious answer here is to use the result type which allows for an Ok case with result data and an Error case with attached information about the error. What to put in the Ok is clear, but how to structure the error case?

Our first try here was to use Jane Street’s Or_error.t, which is a result type where the Error case contains an Error.t type, which is essentially a kind-of string. The main problem here is that while Or_error makes it clear a function can fail it still does not describe the failure cases. Also, catching a specific error case with Or_error is even more difficult than with exceptions.

We then decided to use polymorphic variants as errors, as described by Vladimir Keleshev in Composable Error Handling in OCaml. This allows us to see what errors a function can throw and also match on specific errors to handle them differently. The main downside is that signatures can get a bit unwieldy, but to us this seems to be the best approach hands down.

Avoid opens

open in OCaml is a double-edged sword: it pulls all the names from another module into the current one, so it is not necessary to qualify all names. This can be nice to avoid having to repeat the name of the module in many places. On the other hand it has the same problem as from module import * in Python: it pollutes the namespace with all kinds of names which, especially when multiple modules are opened, are difficult to determine where they are from. In theory this could be solved via tooling, but we believe code should be readable without requiring support tools (e.g. in pull requests on GitHub), therefore we try to avoid using open.

Fortunately, there are a lot of alternatives to module-wide open: explicit qualifying as specified before, using scoped open via Module.()/Module.{} syntax as well as local opens as let open Module in. We tend to use scoped open when constructing values and local open only when pulling in the support functions for ppx_let (let open Monad.Let_syntax in). We also often do module aliases, to avoid having to repeat long and descriptive module names.

The only exception to our open rule are open Core and open Async. The former replaces the standard library since we don’t want to accidentally mix Core with the compiler standard library. The latter is designed to fit with Core opened and is always used this way.

Write out signatures

This is more of a best-practice we have acquired. When writing functions it is sometimes useful to add a signature to the types. This can be done either via an mli file or as we often do in-line with the definition of the function.

OCaml has very good type inference, but when writing code it can happen that due to how the code is structured it will report a type error in another, correct place because the inference works both inside a function but also up from the function to the caller. Adding a signature on a function tells the compiler what we expect the type of the function to be and can be alerted by the compiler if the actual implementation differs.

The way signatures are attached in functions is usually

let f (a : int) (b : string) : float =
  

This has the advantage that it is easy to not constrain the types of certain arguments but the disadvantage is that this notation is very noisy and does not match mli files, which use a syntax more akin to Haskell. Fortunately OCaml also supports this:

let f : int -> string -> float = fun a b ->
  

It is slightly longer but being able to see the signature directly (as the toplevel would print it and as it would be specified in mli) is worth the effort to us. It is also possible to let the inference work on arguments by specifying _ as type.

We often use the selective inference combined with our error handling. We declare a function to to return a (success, _) result type where the underscore will be inferred to the set of polymorphic variants the function might return.

Don’t be too smart

Functors are useful tools and they sometimes have their use. But functor heavy code is difficult to read. OCaml’s module language is very extensive and the syntax for many constructs is rather obscure. It can be hard to understand so avoiding going overboard is what we recommend.

A similar thing applies to generalized algebraic data types (GADTs). Unless there is a very important reason to implement GADTs we tend to avoid them. Fortunately the OCaml type system is quite powerful already, so GADTs are not necessary very often.

The last concept we do not use is objects. While we think that OCaml’s object system is interesting in theory (basically records with subtyping), it is also a rarely used feature in the OCaml world at large. Our code base doesn’t use any objects.

Identifier types

We often deal with all kinds of identifiers, be it usernames, user IDs, E-Mails. In many ways these can be represented as strings, but this opens the gate to accidental misuse: using a user ID where a username was required or similar issues.

For this reason we decided to wrap these “tiny” strings in their own types that we convert to and from at the boundaries of our programs. We often wrap these types in their own modules, so they tend to contain the private type, some common operations like monomorphic compare, a printer and conversion functions to and from strings and/or JSON.

This has proved to be very useful. The only downside is that it tends to be boilerplate heavy, since all these modules have slightly different semantics, dependent on what the identifier is supposed to be used for.

No for/while/if/try

OCaml being a multi-paradigm language has support for all kinds of imperative constructs, among them for and while loops. We usually write our code in a functional way so there is little need to use for and while loops. Most of our imperative bits use Async, which uses functions like Deferred.for_ and Deferred.repeat_until_finished which offer similar looping semantics but using the Async concurrency monad.

We also usually translate if into match since matching on booleans works just as well and looks nice and clean (in a manner similar to Erlang), the implied () returned from if in case there is no else branch is not a behaviour we require often.

Similarly, since OCaml 4.02 match can also match on exceptions which feels more functional and structures nicer than try blocks. Especially in the presence of functions that potentially throw exceptions it is useful to have dedicated match branches for exceptions.

ppx_let

When using monads like option, result, Deferred.t (or Lwt.t) it is often useful to chain execution, which is what the >>= (bind) operator is used for. It is a binary operator, taking the monad on the left side and a callback function on the right side, like so:

let (>>=) = Option.(>>=) in
Some 42 >>= fun number -> 

This code is okay but what it in fact does is bind number to the value x if the monads value is Some x, the fact that this is done with a function is due to the fact that OCaml has no analogon to Haskell’s do-notation.

Therefore we started to use a macro, ppx_let:

let open Option.Let_syntax in
let%bind number = Some 42 in

This makes it easier to read and does not devolve into a difficult to understand sets of nested >>= chains. Core and Async have Let_syntax modules for all the monads defined in them, writing Let_syntax for other monads is easy (an implementation for Lwt is 8 lines of code). This has the advantage that the syntax is always the same no matter which monad is used, no special macros like ppx_lwt are required.

ppx_let has also some other nice features. Besides the regular %bind it also supports %map which works like the map operator (>=| or >|=), as well as working on match, so it is not necessary to bind values to names if the value should be directly pattern matched upon.

This is one of the parts where it took us a while to see the way but we’re now pretty sure about using it. Even better, a variant of this approach has been merged into OCaml so we are looking forward to use this feature in the future.

ppx_deriving

OCaml as a typed language attracts people who want to have as much of their program logic expressed in a typed way was possible. Therefore it is often necessary to translate between “outside” values (mostly strings) and internal, business-logic types. These conversions are mostly mechanical and tedious, which is why the ppx_deriving macro system is very useful for us.

ppx_deriving can generate code that is dependent on your internal types. We use it to generate monomorphic compare functions with ppx_deriving.eq and printers using ppx_deriving.show for the most part, but we also convert JSON into our internal types using ppx_deriving_yojson.

Generally this functionality is exceptionally useful. No surprise Haskell and Rust have a system like this integrated into the language proper.

Avoid catch-all matches

Algebraic Data Types (ADTs) and pattern matching is a simple, yet incredibly powerful concept to dispatch on values. When used properly the compiler can warn the programmer that specific cases were not handled in the code and they need to look at the code in question. Maybe the existing cases need to be adjusted, maybe a new branch needs to be added to accomodate a new variant. In the best case the program can be refactored correctly by the time the compiler accepts it — no need for extensive testing.

This of course only works if the cases are specified explicitly, but OCaml also allows a branch to be taken if none of the other cases match. In moderation this can be useful, but for important code we try to avoid it, since new variants or changes to existing variants might accidentally get matched by the catch-all branch, thus potentially causing issues.

Conclusions

When writing OCaml code it often happens to work on the first try, especially when a little care is put into formalizing the application logic with a few types. We hope this post has shown how we write code that is both maintainable, correct and a joy to work with.

At the same we realize that this is not the last word, we will surely adapt our workflow as tools and the language evolves.