In the developer world one of the most distinguishing facts about Issuu is that we use the OCaml programming language in production. We’ve been using it successfully for many years now and have developers specifically move to Denmark to work with us on OCaml code bases!

In all of these years, the OCaml ecosystem has improved a lot. Therefore it makes sense to keep up with the changes to take advantage of these improvements and to not calcify on an old company-specific workflow. This post describes what we’ve learned in these years and where we plan to go. This includes both the ecosystem as well as how we write OCaml code.

The ecosystem

Let’s start with how we use the available tooling.

OPAM

Probably the most important change in recent years has been the creation of OPAM, the OCaml Package Manager. It provides a way to install an OCaml compiler as well as OCaml packages, of which there are 2151 at time of writing. This is a great improvement over previous solutions which have failed to gain community-wide acceptance.

OPAM 2 further improved our workflow thanks to the new “local switches” feature, that is separate installations of OCaml compilers and their libraries per project — it helps us to separate different projects and their dependencies using different versions of the compiler and libraries.

All of our recent OCaml projects have opam files (usually in the 2.0 format) specifying all the dependencies required to build a project, usually constrained to a version range that mostly follows semantic versioning. Our CI system uses this file to build the project, so if the opam file does not specify dependencies correctly, we’ll be notified right away.

OPAM repositories

We have a lot of projects, so we often want to reuse common code between different projects. There are multiple ways to deal with this, some better than others.

One of our earlier attempts was to have a git repository with common code and use it as submodule in other projects. Everyone who has worked with git submodules knows that submodules create about as many problems as they solve. One of our previous failings was to have one set of common code, which then would pull in all kinds of dependencies, even if the project would not use them. Changing the common code is also difficult because it is not clear which other code uses it and might break.

The other approach is to copy-paste the relevant parts but that quickly devolves into the code forking in different ways as different needs, which creates the burden of maintaining the various forks.

With the embrace of OPAM we decided to split common code into their own libraries which can then be separately installed and maintained in a single place.

A lot of the code we want to share is very Issuu-specific so putting it on the official opam-repository is rather wasteful, since it would only be useful to us.

Therefore we created our own repositories that we use in addition to the default OPAM repository:

  • opam-repository-micro, which contains all the code that is free software. All of this software could technically be part of the official OPAM repository, but having our own one allows us to merge new releases faster. It also serves as a staging ground for a “proper” release on the OPAM repository. The libraries in here use a public CI system, issue tracking etc. making outside contributions are as simple as possible.

  • opam-repository-internal, which contains all the libraries that are more specific to our infrastructure which means that publishing them publicly makes little sense. These use our internal CI system, and all kinds of non-public infrastructure.

The latter is using a bit of custom scripting to download the release tarballs from GitHub and expose them via an HTTP server only accessible within Issuu. OPAM is unfortunately currently lacking a nice solution to set up custom package repositories.

In the future it would be nice to have a tool that aligns the OPAM release workflow more to the way NPM releases are done, since we do not require pull requests on new packages in opam-repository-internal. If we could make the CI system automatically create the releases in our repository that would save us a bit of tedium.

dune

Like most OCaml users we had used a lot of build systems, from classic make to omake but found all of them lacking. So when jbuilder came out and gained traction we decided to give it a go and port some our existing projects to use it. By the time jbuilder was renamed to Dune we were so satisfied with the system, we quickly upgraded out project to use it. Since then we haven’t looked back as it has a number of tangible advantages:

  • Generates correct .merlin files automatically. This avoids confusion due to outdated dependencies, making Merlin and the OCaml compiler disagree.
  • Building is fast and rebuilding is faster still. Running the build command a second time should never rebuild anything. This is exactly what build systems are for but many fail at this.
  • It is opinionated, so the structure of all projects is comparable. No more wondering why files don’t get built.
  • It comes with few dependencies and many packages already use it, so often it is already installed anyway.

Since adopting it we have set up some best practices when writing dune files:

  • Create lots of libraries. Creating and using them in dune is very easy to do and allows us to structure code nicely. The libraries don’t even have to be publically exposed! Even some of our executables are split into a executable shell and an executable_lib body.
  • Don’t use explicit modules because for every new module you add you have to add it to the build file. Prefer just creating a library in a folder which will then use all modules in that folder. This minimizes the amount of required changes in dune files when restructuring code.
  • Do not disable the wrapped behaviour. Namespacing in libraries is one of our favourite features in Dune! If linking to a new library foo the code can be found in the Foo module. This makes the code more understandable. The only reason to disable wrapped is if you have an existing library and you want to migrate to dune without changing the API. Other than that we found the default behaviour great to encourage a consistent code structure.

For developers we typically add a Makefile to the top of the project, so they can use familiar make and make test calls, whereas the opam file will call dune in the way other opam files in the OPAM repository do. This makes sure developers get the dev profile, whereas our CI system will build with production flags.

dune-release

When we started splitting code into libraries we needed a convenient way to create releases, which often tends to be a tedious process. Since we use dune and GitHub, using dune-release was an easy choice, as an opinionated fork of topkg and topkg-care.

  • It forces us to write a Changelog, which is nice to have
  • It automatically builds and runs tests to avoid trivially broken releases
  • It creates a release tarball (sadly in the somewhat obscure .tar.bz2 format which it inherited from its predecessor topkg)
  • It tags and creates releases on GitHub, adding the changes to the release
  • It generates and uploads documentation to GitHub pages
  • It creates opam files for submission to an OPAM repository with the right checksums and URLs

Setting up dune-release requires a GitHub token with public_repo permissions, but for creating releases of non-public repos a token of repo permissions is required. Apart from this the tool works without issues on non-public code bases.

Merlin

Merlin is a tool to help with developing OCaml code in your editor, in a way similar to language servers in other languages. It can help with renaming variables, displaying types etc. It works amazingly well, like magic, therefore we think it has a very appropriate name. Make sure to use it, if you don’t already. It makes programming OCaml much more enjoyable, because of direct feedback from the type system.

ocamlformat

This is one of our newest additions to the stack. OCaml can be written in many different ways and there is no universally agreed-upon style guide. This is made worse by the fact that OCaml code can be written in a pretty unreadable way. Therefore we have a pretty strict style of writing and formatting code that is enforced during code reviews. Unfortunately it requires a lot of explanation during onboarding and even then is a constant source of bikeshedding.

With Dune 1.4 supporting ocamlformat out of the box we decided to enable it and adapt to it. We’re not 100% happy with it yet, since the way it formats is odd and the tool has some bugs still. Since our team is editor agnostic we don’t have a great workflow yet, but we hope that ocamlformat will continue to evolve and eventually save us and the OCaml community at large from bikeshedding.

Core & Async

Somewhat contrary to the OCaml community at large we tend to use Jane Street Core and where relevant Async. Core is an alternative, extensive and opinionated standard library to replace the default (“compiler”) standard library.

We have found Core to have a lot of useful, well-thought out features and be very consistent. Jane Street Core matches closely how we want to write OCaml, defaulting where possible to exception-free functions, providing all the bits missing from the compiler stdlib. Similarly Async integrates very well with Core, providing async-wrapped versions of commonly used OCaml data types.

One of the valid criticisms of Core and Async is their large dependency tree, but that is the price we gladly pay for a well-integrated system. There are efforts on the way to make Base a lighter-weight but still useful subset of Core, but we often require functionality which requires at least Core_kernel.

The other common, valid criticism is the relative lack of documentation. There is online documentation but for the most part it only describes signatures, which sometimes are obscured due to the documentation tool getting confused by type aliases. Thanks to the sensible structure of the libraries we have rarely had issues where we don’t know how to use some functionality — our main problem has been to find the desired functionality in the library (lookin’ at you Async.Deferred.repeat_until_finished).

The other alternatives in the standard library space are less exciting:

  • The compiler standard library is missing many useful things, the functions tend not to be tail-recursive and it defaults to throwing exceptions. In general this library is not ready for production use and leads to every project having a random module with more or less well implemented missing bits.
  • Extlib went a long time unmaintained and keeping compatibility with the standard library makes it default to exceptions
  • Batteries started as maintained superset of Extlib, but is hardly maintained nowadays. The pervasiveness of the BatEnum.t type everywhere leads to lots of conversions between types
  • Containers is a good library, carefully designed. If not for Base, this would be the most interesting contender for a good standard library replacement

Containers

We use Docker images to build and deploy our software. Fortunately there is an officially-provided opam2 image which includes OPAM 2 and a number of pre-built compilers.

To keep the size down we use this container to build the native binaries and then copy them to a second stage container which starts from the same operating system but without OPAM.

Here is a simplified example:

FROM ocaml/opam2:debian-9 as builder
COPY program.opam /home/opam/program/
RUN opam switch 4.07 && \
  git -C /home/opam/opam-repository pull --quiet && \
  opam update --quiet > /dev/null && \
  opam pin --no-action --yes add program /home/opam/program && \
  opam install --deps-only --yes program
COPY src /home/opam/program/src
RUN opam install program

FROM debian:9
RUN useradd -ms /bin/bash opam
USER opam
COPY --from=builder /home/opam/.opam/*/bin/program* /usr/bin/

This approach has the advantage that the (rarely changing) dependencies of a program are cached on their own layer, so rebuilding program is quick and the container that we deploy does not need to have any OCaml-specific software installed except for the binary itself.

What about the language?

This is our current state before you really get to write any code. Of course we have also some best practices for writing OCaml code, this will be part of a future post coming soon. Stay tuned!

Update: The second part was published, take a look!