Our current OCaml best practices, part 1
It is useful to review your way to work every now and then
In the developer world one of the most distinguishing facts about Issuu is that we use the OCaml programming language in production. We’ve been using it successfully for many years now and have developers specifically move to Denmark to work with us on OCaml code bases!
In all of these years, the OCaml ecosystem has improved a lot. Therefore it makes sense to keep up with the changes to take advantage of these improvements and to not calcify on an old company-specific workflow. This post describes what we’ve learned in these years and where we plan to go. This includes both the ecosystem as well as how we write OCaml code.
Let’s start with how we use the available tooling.
Probably the most important change in recent years has been the creation of OPAM, the OCaml Package Manager. It provides a way to install an OCaml compiler as well as OCaml packages, of which there are 2151 at time of writing. This is a great improvement over previous solutions which have failed to gain community-wide acceptance.
OPAM 2 further improved our workflow thanks to the new “local switches” feature, that is separate installations of OCaml compilers and their libraries per project — it helps us to separate different projects and their dependencies using different versions of the compiler and libraries.
All of our recent OCaml projects have
opam files (usually in the
format) specifying all the dependencies required to build a project, usually
constrained to a version range that mostly follows semantic
versioning. Our CI system uses this file to build the
project, so if the
opam file does not specify dependencies correctly, we’ll
be notified right away.
We have a lot of projects, so we often want to reuse common code between different projects. There are multiple ways to deal with this, some better than others.
One of our earlier attempts was to have a git repository with common code and use it as submodule in other projects. Everyone who has worked with git submodules knows that submodules create about as many problems as they solve. One of our previous failings was to have one set of common code, which then would pull in all kinds of dependencies, even if the project would not use them. Changing the common code is also difficult because it is not clear which other code uses it and might break.
The other approach is to copy-paste the relevant parts but that quickly devolves into the code forking in different ways as different needs, which creates the burden of maintaining the various forks.
With the embrace of OPAM we decided to split common code into their own libraries which can then be separately installed and maintained in a single place.
A lot of the code we want to share is very Issuu-specific so putting it on the official opam-repository is rather wasteful, since it would only be useful to us.
Therefore we created our own repositories that we use in addition to the default OPAM repository:
opam-repository-micro, which contains all the code that is free software. All of this software could technically be part of the official OPAM repository, but having our own one allows us to merge new releases faster. It also serves as a staging ground for a “proper” release on the OPAM repository. The libraries in here use a public CI system, issue tracking etc. making outside contributions are as simple as possible.
opam-repository-internal, which contains all the libraries that are more specific to our infrastructure which means that publishing them publicly makes little sense. These use our internal CI system, and all kinds of non-public infrastructure.
The latter is using a bit of custom scripting to download the release tarballs from GitHub and expose them via an HTTP server only accessible within Issuu. OPAM is unfortunately currently lacking a nice solution to set up custom package repositories.
In the future it would be nice to have a tool that aligns the OPAM release
workflow more to the way NPM releases are done, since we do not require pull
requests on new packages in
opam-repository-internal. If we could make the CI
system automatically create the releases in our repository that would save us a
bit of tedium.
Like most OCaml users we had used a lot of build systems, from classic
omake but found all of them lacking. So when
jbuilder came out and
gained traction we decided to give it a go and port some our existing projects
to use it. By the time
jbuilder was renamed to Dune we were so satisfied with
the system, we quickly upgraded out project to use it. Since then we haven’t
looked back as it has a number of tangible advantages:
- Generates correct
.merlinfiles automatically. This avoids confusion due to outdated dependencies, making Merlin and the OCaml compiler disagree.
- Building is fast and rebuilding is faster still. Running the build command a second time should never rebuild anything. This is exactly what build systems are for but many fail at this.
- It is opinionated, so the structure of all projects is comparable. No more wondering why files don’t get built.
- It comes with few dependencies and many packages already use it, so often it is already installed anyway.
Since adopting it we have set up some best practices when writing
- Create lots of libraries. Creating and using them in
duneis very easy to do and allows us to structure code nicely. The libraries don’t even have to be publically exposed! Even some of our executables are split into a
executableshell and an
- Don’t use explicit
modulesbecause for every new module you add you have to add it to the build file. Prefer just creating a library in a folder which will then use all modules in that folder. This minimizes the amount of required changes in
dunefiles when restructuring code.
- Do not disable the
wrappedbehaviour. Namespacing in libraries is one of our favourite features in Dune! If linking to a new library
foothe code can be found in the
Foomodule. This makes the code more understandable. The only reason to disable
wrappedis if you have an existing library and you want to migrate to
dunewithout changing the API. Other than that we found the default behaviour great to encourage a consistent code structure.
For developers we typically add a
Makefile to the top of the project, so they
can use familiar
make test calls, whereas the
opam file will
dune in the way other
opam files in the OPAM repository do. This makes
sure developers get the
dev profile, whereas our CI system will build
with production flags.
When we started splitting code into libraries we needed a convenient way to
create releases, which often tends to be a tedious process. Since we use
and GitHub, using dune-release was an
easy choice, as an opinionated fork of
- It forces us to write a Changelog, which is nice to have
- It automatically builds and runs tests to avoid trivially broken releases
- It creates a release tarball (sadly in the somewhat obscure
.tar.bz2format which it inherited from its predecessor
- It tags and creates releases on GitHub, adding the changes to the release
- It generates and uploads documentation to GitHub pages
- It creates
opamfiles for submission to an OPAM repository with the right checksums and URLs
dune-release requires a GitHub token with
permissions, but for creating releases of non-public repos a token of
permissions is required. Apart from this the tool works without issues on
non-public code bases.
Merlin is a tool to help with developing OCaml code in your editor, in a way similar to language servers in other languages. It can help with renaming variables, displaying types etc. It works amazingly well, like magic, therefore we think it has a very appropriate name. Make sure to use it, if you don’t already. It makes programming OCaml much more enjoyable, because of direct feedback from the type system.
This is one of our newest additions to the stack. OCaml can be written in many different ways and there is no universally agreed-upon style guide. This is made worse by the fact that OCaml code can be written in a pretty unreadable way. Therefore we have a pretty strict style of writing and formatting code that is enforced during code reviews. Unfortunately it requires a lot of explanation during onboarding and even then is a constant source of bikeshedding.
With Dune 1.4 supporting
ocamlformat out of the box we decided to enable it
and adapt to it. We’re not 100% happy with it yet, since the way it formats is
odd and the
tool has some bugs still. Since our team is editor agnostic we don’t have a
great workflow yet, but we hope that
ocamlformat will continue to evolve and
eventually save us and the OCaml community at large from bikeshedding.
Core & Async
Somewhat contrary to the OCaml community at large we tend to use Jane Street Core and where relevant Async. Core is an alternative, extensive and opinionated standard library to replace the default (“compiler”) standard library.
We have found Core to have a lot of useful, well-thought out features and be very consistent. Jane Street Core matches closely how we want to write OCaml, defaulting where possible to exception-free functions, providing all the bits missing from the compiler stdlib. Similarly Async integrates very well with Core, providing async-wrapped versions of commonly used OCaml data types.
One of the valid criticisms of Core and Async is their large dependency tree, but that is the price we gladly pay for a well-integrated system. There are efforts on the way to make Base a lighter-weight but still useful subset of Core, but we often require functionality which requires at least Core_kernel.
The other common, valid criticism is the relative lack of documentation. There
is online documentation but for the most part it only describes signatures,
which sometimes are obscured due to the documentation tool getting confused by
type aliases. Thanks to the sensible structure of the libraries we have rarely
had issues where we don’t know how to use some functionality — our main problem
has been to find the desired functionality in the library (lookin’ at you
The other alternatives in the standard library space are less exciting:
- The compiler standard library is missing many useful things, the functions tend not to be tail-recursive and it defaults to throwing exceptions. In general this library is not ready for production use and leads to every project having a random module with more or less well implemented missing bits.
- Extlib went a long time unmaintained and keeping compatibility with the standard library makes it default to exceptions
- Batteries started as maintained superset of Extlib, but is hardly maintained
nowadays. The pervasiveness of the
BatEnum.ttype everywhere leads to lots of conversions between types
- Containers is a good library, carefully designed. If not for Base, this would be the most interesting contender for a good standard library replacement
We use Docker images to build and deploy our software. Fortunately there is an officially-provided opam2 image which includes OPAM 2 and a number of pre-built compilers.
To keep the size down we use this container to build the native binaries and then copy them to a second stage container which starts from the same operating system but without OPAM.
Here is a simplified example:
FROM ocaml/opam2:debian-9 as builder COPY program.opam /home/opam/program/ RUN opam switch 4.07 && \ git -C /home/opam/opam-repository pull --quiet && \ opam update --quiet > /dev/null && \ opam pin --no-action --yes add program /home/opam/program && \ opam install --deps-only --yes program COPY src /home/opam/program/src RUN opam install program FROM debian:9 RUN useradd -ms /bin/bash opam USER opam COPY --from=builder /home/opam/.opam/*/bin/program* /usr/bin/
This approach has the advantage that the (rarely changing) dependencies of a
program are cached on their own layer, so rebuilding
program is quick and the
container that we deploy does not need to have any OCaml-specific software
installed except for the binary itself.
What about the language?
This is our current state before you really get to write any code. Of course we have also some best practices for writing OCaml code, this will be part of a future post coming soon. Stay tuned!
Update: The second part was published, take a look!