Our current OCaml best practices, part 2
This time it is about the language
In the first part of this post we talked about the general tooling we use to develop and deploy our OCaml applications. Given the overall feedback we have gotten so far from the community it seems this approach is generally considered the way to go about OCaml in 2018.
This post will be discuss the general structure how we actually write code. No worries, it is not a style guide, we’re not going to open a tabs-vs-spaces topic. This is a slightly higher-level discussion of Do’s and Don’ts in OCaml.
The language
Being first released over 20 years ago, OCaml is not a new language. Due to the fact that compatibility with older code is prioritized there is a lot of ways code can be written for historical reasons but practice has shown that some ways are better than others. In this section we’ll outline how we try to write code to keep readability and maintainability high.
Avoid exceptions
While OCaml does support exceptions we try to avoid them when writing code. The
reason is that while exceptions are easy to throw, they make reasoning about
code difficult, since the type system does not track which code can throw which
exceptions. This gives rise to writing code that ignores exceptions (it
compiles fine, right?) and then breaks at runtime because of some
`Not_found
exception that is being thrown in a place deep within the code.
Result types
If not using exceptions, what to use to signal potential failures? The most
obvious answer here is to use the result
type which allows for an Ok
case
with result data and an Error
case with attached information about the error.
What to put in the Ok
is clear, but how to structure the error case?
Our first try here was to use Jane Street’s Or_error.t
, which is a result
type where the Error
case contains an Error.t
type, which is essentially a
kind-of string. The main problem here is that while Or_error
makes it clear a
function can fail it still does not describe the failure cases. Also, catching
a specific error case with Or_error
is even more difficult than with
exceptions.
We then decided to use polymorphic variants as errors, as described by Vladimir Keleshev in Composable Error Handling in OCaml. This allows us to see what errors a function can throw and also match on specific errors to handle them differently. The main downside is that signatures can get a bit unwieldy, but to us this seems to be the best approach hands down.
Avoid open
s
open
in OCaml is a double-edged sword: it pulls all the names from another
module into the current one, so it is not necessary to qualify all names. This
can be nice to avoid having to repeat the name of the module in many places. On
the other hand it has the same problem as from module import *
in Python: it
pollutes the namespace with all kinds of names which, especially when multiple
modules are opened, are difficult to determine where they are from. In theory
this could be solved via tooling, but we believe code should be readable
without requiring support tools (e.g. in pull requests on GitHub), therefore we
try to avoid using open
.
Fortunately, there are a lot of alternatives to module-wide open
: explicit
qualifying as specified before, using scoped open
via Module.()
/Module.{}
syntax as well as local opens as let open Module in
. We tend to use scoped
open
when constructing values and local open
only when pulling in the
support functions for ppx_let
(let open Monad.Let_syntax in
). We also often
do module aliases, to avoid having to repeat long and descriptive module names.
The only exception to our open
rule are open Core
and open Async
. The
former replaces the standard library since we don’t want to accidentally mix
Core with the compiler standard library. The latter is designed to fit with
Core opened and is always used this way.
Write out signatures
This is more of a best-practice we have acquired. When writing functions it is
sometimes useful to add a signature to the types. This can be done either via
an mli
file or as we often do in-line with the definition of the function.
OCaml has very good type inference, but when writing code it can happen that due to how the code is structured it will report a type error in another, correct place because the inference works both inside a function but also up from the function to the caller. Adding a signature on a function tells the compiler what we expect the type of the function to be and can be alerted by the compiler if the actual implementation differs.
The way signatures are attached in functions is usually
let f (a : int) (b : string) : float =
…
This has the advantage that it is easy to not constrain the types of certain
arguments but the disadvantage is that this notation is very noisy and does not
match mli
files, which use a syntax more akin to Haskell. Fortunately OCaml
also supports this:
let f : int -> string -> float = fun a b ->
…
It is slightly longer but being able to see the signature directly (as the
toplevel would print it and as it would be specified in mli
) is worth the
effort to us. It is also possible to let the inference work on arguments by
specifying _
as type.
We often use the selective inference combined with our error handling. We
declare a function to to return a (success, _) result
type where the
underscore will be inferred to the set of polymorphic variants the function
might return.
Don’t be too smart
Functors are useful tools and they sometimes have their use. But functor heavy code is difficult to read. OCaml’s module language is very extensive and the syntax for many constructs is rather obscure. It can be hard to understand so avoiding going overboard is what we recommend.
A similar thing applies to generalized algebraic data types (GADTs). Unless there is a very important reason to implement GADTs we tend to avoid them. Fortunately the OCaml type system is quite powerful already, so GADTs are not necessary very often.
The last concept we do not use is objects. While we think that OCaml’s object system is interesting in theory (basically records with subtyping), it is also a rarely used feature in the OCaml world at large. Our code base doesn’t use any objects.
Identifier types
We often deal with all kinds of identifiers, be it usernames, user IDs, E-Mails. In many ways these can be represented as strings, but this opens the gate to accidental misuse: using a user ID where a username was required or similar issues.
For this reason we decided to wrap these “tiny” strings in their own types that we convert to and from at the boundaries of our programs. We often wrap these types in their own modules, so they tend to contain the private type, some common operations like monomorphic compare, a printer and conversion functions to and from strings and/or JSON.
This has proved to be very useful. The only downside is that it tends to be boilerplate heavy, since all these modules have slightly different semantics, dependent on what the identifier is supposed to be used for.
No for
/while
/if
/try
OCaml being a multi-paradigm language has support for all kinds of imperative
constructs, among them for
and while
loops. We usually write our code in a
functional way so there is little need to use for
and while
loops. Most of
our imperative bits use Async, which uses functions like Deferred.for_
and
Deferred.repeat_until_finished
which offer similar looping semantics but
using the Async concurrency monad.
We also usually translate if
into match
since matching on booleans works
just as well and looks nice and clean (in a manner similar to Erlang), the
implied ()
returned from if
in case there is no else
branch is not a
behaviour we require often.
Similarly, since OCaml 4.02 match
can also match on exceptions which feels
more functional and structures nicer than try
blocks. Especially in the
presence of functions that potentially throw exceptions it is useful to have
dedicated match
branches for exceptions.
ppx_let
When using monads like option
, result
, Deferred.t
(or Lwt.t
) it is
often useful to chain execution, which is what the >>=
(bind) operator is
used for. It is a binary operator, taking the monad on the left side and a
callback function on the right side, like so:
let (>>=) = Option.(>>=) in
Some 42 >>= fun number -> …
This code is okay but what it in fact does is bind number
to the value x
if
the monads value is Some x
, the fact that this is done with a function is due
to the fact that OCaml has no analogon to Haskell’s do
-notation.
Therefore we started to use a macro, ppx_let
:
let open Option.Let_syntax in
let%bind number = Some 42 in
…
This makes it easier to read and does not devolve into a difficult to
understand sets of nested >>=
chains. Core
and Async
have Let_syntax
modules for all the monads defined in them, writing Let_syntax
for other
monads is easy (an implementation for Lwt is 8 lines of
code).
This has the advantage that the syntax is always the same no matter which monad
is used, no special macros like ppx_lwt
are required.
ppx_let
has also some other nice features. Besides the regular %bind
it
also supports %map
which works like the map operator (>=|
or >|=
), as
well as working on match
, so it is not necessary to bind values to names if
the value should be directly pattern matched upon.
This is one of the parts where it took us a while to see the way but we’re now pretty sure about using it. Even better, a variant of this approach has been merged into OCaml so we are looking forward to use this feature in the future.
ppx_deriving
OCaml as a typed language attracts people who want to have as much of their
program logic expressed in a typed way was possible. Therefore it is often
necessary to translate between “outside” values (mostly strings) and internal,
business-logic types. These conversions are mostly mechanical and tedious,
which is why the ppx_deriving
macro system is very useful for us.
ppx_deriving
can generate code that is dependent on your internal types. We
use it to generate monomorphic compare
functions with ppx_deriving.eq
and
printers using ppx_deriving.show
for the most part, but we also convert JSON
into our internal types using ppx_deriving_yojson
.
Generally this functionality is exceptionally useful. No surprise Haskell and Rust have a system like this integrated into the language proper.
Avoid catch-all matches
Algebraic Data Types (ADTs) and pattern matching is a simple, yet incredibly powerful concept to dispatch on values. When used properly the compiler can warn the programmer that specific cases were not handled in the code and they need to look at the code in question. Maybe the existing cases need to be adjusted, maybe a new branch needs to be added to accomodate a new variant. In the best case the program can be refactored correctly by the time the compiler accepts it — no need for extensive testing.
This of course only works if the cases are specified explicitly, but OCaml also allows a branch to be taken if none of the other cases match. In moderation this can be useful, but for important code we try to avoid it, since new variants or changes to existing variants might accidentally get matched by the catch-all branch, thus potentially causing issues.
Conclusions
When writing OCaml code it often happens to work on the first try, especially when a little care is put into formalizing the application logic with a few types. We hope this post has shown how we write code that is both maintainable, correct and a joy to work with.
At the same we realize that this is not the last word, we will surely adapt our workflow as tools and the language evolves.