In the previous article in this series, we took a deep look at the overall system architecture of DoltHub, the online data community powered by the Dolt version-controlled database. In this article, we'll zoom in on the front end and see how the code is organized into packages, linked together, built, and linted.
In revision control systems, a monorepo (syllabic abbreviation of monolithic repository) is a software development strategy where code for many projects is stored in the same repository.
We have a monorepo in two senses:
Firstly, our front-end code is collocated with our back-end code, protobuf definitions, deployment configuration, and a few other things in a single Git repo. This has some advantages:
Unless it requires a change to Dolt (which has its own public repository), a single pull request can add a feature across the entire stack.
Protobuf definitions live with their consumers without having to make that a submodule (our proto directory already has submodules for the build tools it depends on, which is gross enough).
It's easy to refer across different parts of the project (for example, to look at protobuf definitions or service layer logic while working on the front end), without the risk of checking out incompatible revisions.
On the other hand, it's a big honkin' repo (cloning it currently transfers 170.58 MiB of data, and according to rsync, it's got something like 150k-170k files in it when checked out and full of node_modules and whatnot), and I often have multiple copies of it checked out at once so I can work on different things. It can get a bit unwieldy.
The second sense in which we have a monorepo is that both our front-end and back-end code are broken into packages which are all hosted in this same repository. We'll be focusing on the front-end ones here.
We use Yarn workspaces to manage our front-end packages. This means that each package is just a regular NPM package with its own package.json. We can add our packages as dependencies in one another's package.json files exactly as with any other package, but they are resolved locally and share a single node_modules, which is deduped as much as possible.
The listing for our web directory and its packages subdirectory, showing the Yarn workspace structure. You can see that a considerable amount of configuration is also shared across the entire workspace.
Using this setup, we have no need to publish packages or fiddle with private NPM repositories or anything like that—everything is local, fast, and secure by default. We used to use Lerna in addition to Yarn workspaces but realized it bought us nothing if we weren't making public releases of our packages (and we're not).
The major downside to all of this is that it required a bit of research and configuration, but now that it's set up, this part of things has been more or less headache-free. In the past, we've run into the dreaded multiple versions of React error trying to use React in a monorepo but on the latest iteration of the architecture we've evaded that so far.
We use TypeScript for everything
We have a base tsconfig.json that some of the individual packages extend. Harmonizing and factoring this further is on the to-do list but there are all kinds of packages in here. We have:
The major downsides are added complexity in setup and in actually using the language. The most obvious point here is that you have to compile it. This introduces a build step which requires tooling and setup, takes time to execute, and increases complexity.
We set up TypeScript project references
Since all our front-end code is in TypeScript, which needs to be compiled, and is separated into many packages/projects, we need a way to specify build dependencies and order.
For example, dolthub depends on dolthub-api, events-api, fakers, graphql-server, resource-utils, shared-components, tailwind-config, and utils (wow). If anything changes in the typings of any of those projects, DoltHub needs to be typechecked again. On the other hand, if we change an implementation detail that doesn't change the types, DoltHub doesn't need to be typechecked again.
Before project references, this was a real pain. If you changed a package, you needed to re-compile it and also re-check any package depending on it, with no automatic detection of whether a change actually changed the types. If your packages needed to be built in a particular order, you were on your own as far as enforcing it. Furthermore, while tsc has had a watch mode since TypeScript's first release, before project references it wouldn't see changes in dependencies.
Previous possible solutions included Makefiles (which can do damn near anything, but ick), package.json scripts (less powerful but more familiar to web devs, but also ick), build runners like Gulp (what is this, 2015?), and other more bespoke solutions. These can solve the build order automation problem easily but detecting whether an implementation change changed the type signatures is harder, and these solutions would still be running tsc multiple times under the hood, with all the startup overhead that incurs.
Project references are designed to solve these woes and more. Setting them up is pretty simple. For example, here's DoltHub's tsconfig.json in its entirety. You can see its dependencies listed as the references key:
DoltHub's tsconfig.json showing the project references configuration
The other main change needed is that each of those referenced projects must have "composite": true in its tsconfig.json. This option has some requirements and implications but the compiler is pretty helpful about guiding you towards compliance.
In the end, the payoff is that you can tsc -b (incrementally build) or tsc -w (watch) your project and TypeScript will automatically compile its dependencies in a correct order and only if needed, before building and typechecking the main project only if needed. This reduced our build times by about half on average and got rid of a really gross and unwieldy set of package.json scripts that we used to specify build order.
We lint our code with typescript-eslint
In addition to types (which we've covered) and testing (which deserves an article of its own, and will get one), linting is one of the core pillars of our code correctness strategy.
This fragmentation was a thorn in the side of TypeScript developers, but relief came in the form of an announcement early in 2019 that Palantir, the backers of TSLint, would be deprecating TSLint in favor of supporting typescript-eslint, which is exactly what it sounds like: an effort to bring TypeScript support directly to ESLint.
We use typescript-eslint and have had mostly great experiences with it so far, though configuration (of the tool itself, not the rules) has been a headache at times. We start by extending Airbnb's TypeScript preset, then add some other rule sets and plugins before our own customizations: