Our current OCaml best practices, part 2
This time it is about the language
In the first part of this post we talked about the general tooling we use to develop and deploy our OCaml applications. Given the overall feedback we have gotten so far from the community it seems this approach is generally considered the way to go about OCaml in 2018.
This post will be discuss the general structure how we actually write code. No worries, it is not a style guide, we’re not going to open a tabs-vs-spaces topic. This is a slightly higher-level discussion of Do’s and Don’ts in OCaml.
Being first released over 20 years ago, OCaml is not a new language. Due to the fact that compatibility with older code is prioritized there is a lot of ways code can be written for historical reasons but practice has shown that some ways are better than others. In this section we’ll outline how we try to write code to keep readability and maintainability high.
While OCaml does support exceptions we try to avoid them when writing code. The
reason is that while exceptions are easy to throw, they make reasoning about
code difficult, since the type system does not track which code can throw which
exceptions. This gives rise to writing code that ignores exceptions (it
compiles fine, right?) and then breaks at runtime because of some
`Not_found exception that is being thrown in a place deep within the code.
If not using exceptions, what to use to signal potential failures? The most
obvious answer here is to use the
result type which allows for an
with result data and an
Error case with attached information about the error.
What to put in the
Ok is clear, but how to structure the error case?
Our first try here was to use Jane Street’s
Or_error.t, which is a
type where the
Error case contains an
Error.t type, which is essentially a
kind-of string. The main problem here is that while
Or_error makes it clear a
function can fail it still does not describe the failure cases. Also, catching
a specific error case with
Or_error is even more difficult than with
We then decided to use polymorphic variants as errors, as described by Vladimir Keleshev in Composable Error Handling in OCaml. This allows us to see what errors a function can throw and also match on specific errors to handle them differently. The main downside is that signatures can get a bit unwieldy, but to us this seems to be the best approach hands down.
open in OCaml is a double-edged sword: it pulls all the names from another
module into the current one, so it is not necessary to qualify all names. This
can be nice to avoid having to repeat the name of the module in many places. On
the other hand it has the same problem as
from module import * in Python: it
pollutes the namespace with all kinds of names which, especially when multiple
modules are opened, are difficult to determine where they are from. In theory
this could be solved via tooling, but we believe code should be readable
without requiring support tools (e.g. in pull requests on GitHub), therefore we
try to avoid using
Fortunately, there are a lot of alternatives to module-wide
qualifying as specified before, using scoped
syntax as well as local opens as
let open Module in. We tend to use scoped
open when constructing values and local
open only when pulling in the
support functions for
let open Monad.Let_syntax in). We also often
do module aliases, to avoid having to repeat long and descriptive module names.
The only exception to our
open rule are
open Core and
open Async. The
former replaces the standard library since we don’t want to accidentally mix
Core with the compiler standard library. The latter is designed to fit with
Core opened and is always used this way.
Write out signatures
This is more of a best-practice we have acquired. When writing functions it is
sometimes useful to add a signature to the types. This can be done either via
mli file or as we often do in-line with the definition of the function.
OCaml has very good type inference, but when writing code it can happen that due to how the code is structured it will report a type error in another, correct place because the inference works both inside a function but also up from the function to the caller. Adding a signature on a function tells the compiler what we expect the type of the function to be and can be alerted by the compiler if the actual implementation differs.
The way signatures are attached in functions is usually
let f (a : int) (b : string) : float = …
This has the advantage that it is easy to not constrain the types of certain
arguments but the disadvantage is that this notation is very noisy and does not
mli files, which use a syntax more akin to Haskell. Fortunately OCaml
also supports this:
let f : int -> string -> float = fun a b -> …
It is slightly longer but being able to see the signature directly (as the
toplevel would print it and as it would be specified in
mli) is worth the
effort to us. It is also possible to let the inference work on arguments by
_ as type.
We often use the selective inference combined with our error handling. We
declare a function to to return a
(success, _) result type where the
underscore will be inferred to the set of polymorphic variants the function
Don’t be too smart
Functors are useful tools and they sometimes have their use. But functor heavy code is difficult to read. OCaml’s module language is very extensive and the syntax for many constructs is rather obscure. It can be hard to understand so avoiding going overboard is what we recommend.
A similar thing applies to generalized algebraic data types (GADTs). Unless there is a very important reason to implement GADTs we tend to avoid them. Fortunately the OCaml type system is quite powerful already, so GADTs are not necessary very often.
The last concept we do not use is objects. While we think that OCaml’s object system is interesting in theory (basically records with subtyping), it is also a rarely used feature in the OCaml world at large. Our code base doesn’t use any objects.
We often deal with all kinds of identifiers, be it usernames, user IDs, E-Mails. In many ways these can be represented as strings, but this opens the gate to accidental misuse: using a user ID where a username was required or similar issues.
For this reason we decided to wrap these “tiny” strings in their own types that we convert to and from at the boundaries of our programs. We often wrap these types in their own modules, so they tend to contain the private type, some common operations like monomorphic compare, a printer and conversion functions to and from strings and/or JSON.
This has proved to be very useful. The only downside is that it tends to be boilerplate heavy, since all these modules have slightly different semantics, dependent on what the identifier is supposed to be used for.
OCaml being a multi-paradigm language has support for all kinds of imperative
constructs, among them
while loops. We usually write our code in a
functional way so there is little need to use
while loops. Most of
our imperative bits use Async, which uses functions like
Deferred.repeat_until_finished which offer similar looping semantics but
using the Async concurrency monad.
We also usually translate
match since matching on booleans works
just as well and looks nice and clean (in a manner similar to Erlang), the
() returned from
if in case there is no
else branch is not a
behaviour we require often.
Similarly, since OCaml 4.02
match can also match on exceptions which feels
more functional and structures nicer than
try blocks. Especially in the
presence of functions that potentially throw exceptions it is useful to have
match branches for exceptions.
When using monads like
Lwt.t) it is
often useful to chain execution, which is what the
>>= (bind) operator is
used for. It is a binary operator, taking the monad on the left side and a
callback function on the right side, like so:
let (>>=) = Option.(>>=) in Some 42 >>= fun number -> …
This code is okay but what it in fact does is bind
number to the value
the monads value is
Some x, the fact that this is done with a function is due
to the fact that OCaml has no analogon to Haskell’s
Therefore we started to use a macro,
let open Option.Let_syntax in let%bind number = Some 42 in …
This makes it easier to read and does not devolve into a difficult to
understand sets of nested
modules for all the monads defined in them, writing
Let_syntax for other
monads is easy (an implementation for Lwt is 8 lines of
This has the advantage that the syntax is always the same no matter which monad
is used, no special macros like
ppx_lwt are required.
ppx_let has also some other nice features. Besides the regular
%map which works like the map operator (
well as working on
match, so it is not necessary to bind values to names if
the value should be directly pattern matched upon.
This is one of the parts where it took us a while to see the way but we’re now pretty sure about using it. Even better, a variant of this approach has been merged into OCaml so we are looking forward to use this feature in the future.
OCaml as a typed language attracts people who want to have as much of their
program logic expressed in a typed way was possible. Therefore it is often
necessary to translate between “outside” values (mostly strings) and internal,
business-logic types. These conversions are mostly mechanical and tedious,
which is why the
ppx_deriving macro system is very useful for us.
ppx_deriving can generate code that is dependent on your internal types. We
use it to generate monomorphic
compare functions with
ppx_deriving.show for the most part, but we also convert JSON
into our internal types using
Generally this functionality is exceptionally useful. No surprise Haskell and Rust have a system like this integrated into the language proper.
Avoid catch-all matches
Algebraic Data Types (ADTs) and pattern matching is a simple, yet incredibly powerful concept to dispatch on values. When used properly the compiler can warn the programmer that specific cases were not handled in the code and they need to look at the code in question. Maybe the existing cases need to be adjusted, maybe a new branch needs to be added to accomodate a new variant. In the best case the program can be refactored correctly by the time the compiler accepts it — no need for extensive testing.
This of course only works if the cases are specified explicitly, but OCaml also allows a branch to be taken if none of the other cases match. In moderation this can be useful, but for important code we try to avoid it, since new variants or changes to existing variants might accidentally get matched by the catch-all branch, thus potentially causing issues.
When writing OCaml code it often happens to work on the first try, especially when a little care is put into formalizing the application logic with a few types. We hope this post has shown how we write code that is both maintainable, correct and a joy to work with.
At the same we realize that this is not the last word, we will surely adapt our workflow as tools and the language evolves.