DoltHub is our web application for working
with, sharing and collaborating on Dolt
databases. We host dolt
remotes and run
bounties on DoltHub, among other
things. Sometimes when talking with customers or candidates, we get
questions about how infrastructure at DoltHub is currently structured
and generally how things work. This blog post is a point-in-time
snapshot of generally how we've built things out so far.
The major components of DoltHub are currently:
delivered to a user's browser.
A GraphQL API for the frontend, with queries and mutations to load
and change dynamic data on DoltHub.
Backend service APIs, which maintain metadata and data on DoltHub,
implement business rules, and provide a common interface for
GraphQL resolver and mutation implementations.
Dolt remotes hosting, consisting of an API which Dolt clients can
resolve databases against, and database storage.
In addition, we have plenty of smaller adhoc stuff going on, but this
blog post will focus on these major components for now.
In general, our infrastructure is deployed within AWS. We love the flexibility,
incremental pricing and very broad product offerings that come with the major
public cloud providers. Much of what we build is intentionally cloud agnostic,
which we think is important for strategic and operational cost reasons, but we
sometimes take vendor coupling where differentiation and operational costs
shift the balance.
We will get into specifics shortly, but our first-party code invariably runs
within AWS EKS clusters. For durable storage, we generally make use of RDS,
DynamoDB and S3.
How we use EKS
EKS is AWS's hosted kubernetes offering, and we've been using it since
nearly the very beginning of DoltHub. We deploy all of our first party
services into EKS clusters, and we use a GitOps like workflow where
every logically deployable kubernetes artifact is represented by a
kustomize manifest within our source
repository. We build Docker images with BuildKit and Bazel, and we
push them to ECR before updating the deployed artifacts manifests in
the repository and applyint the manifest changes to our clusters. We
will sometimes manually rollback with
kubectl rollout undo, or scale
kubectl scale, but shortly after any manual changes to the
cluster we update the manifests and image artifacts hashes in the
repository and go through a deployment.
We chose Kubernetes as a cloud agnostic control plane and API for deploying
first- and third-party applications in a standardized way. It has a ton of
community support behind it and expertise and support are easy to find. It
comes with a steep initial learning curve, but hosted offerings lower the
operational costs substantially and the general abstractions and constraints
it presents to application developers helps to keep the software we're
deploying on it decoupled and following best practices.
DoltHub itself is built as a Next.js
application. The functionality is almost entirely client-side, and it
does not currently make much use server-side rendering, although we
would like to in the future. We do currently implement some server-side
code for things like feature gates and oauth access token exchange, but
it's quite minimal.
The application itself deploys as a Node.js process in a docker
container running in an EKS cluster.
DoltHub GraphQL API
DoltHub serves web assets to a user's browser and those assets fetch
and mutate state for the application by calling our GraphQL API. The
GraphQL layer is built in
Nest.js. It's deployed
as a Node.js process running in a docker container in our EKS
The GraphQL application consists of models and resolvers. The models
get compiled into a client library that can be used by the DoltHub
frontend, while the resolvers run server-side and compute the results
of queries and mutations that the frontend makes. Generally the
resolvers call out to other APIs that are running in our cluster.
There are a few backend APIs that the GraphQL API layer talks to, but
the most important one by far is called DoltHub API. It implements a
number of logical services supporting all aspects of DoltHub's data
model, including user accounts, organizations, repositories, pull
requests, bounties, sessions, etc. All of our internal APIs are
gRPC APIs written in golang, and they all deploy
as deployments and services in our EKS clusters. We've been happy with gRPC as
a schema'd RPC framework with a mature ecosystem around it. In general, when we
are developing features for DoltHub, we want to absolve individual developers
from having to make accidental technical decisions. Having a widely supported
framework that works for a wide variety of use cases, a common language for
how to structure APIs, and nice
generated code for clients and servers in multiple languages is a nice tailwind
for getting things done. For similar reasons, DoltHub API's primary durable
storage backend is an RDS postgres instance.
Remotes hosting is responsible for hosting DoltHub databases, and has
clients of both the Dolt CLI, in operations like
pull, and of DoltHub, where it surfaces database data for
things like pull requests, diffs, query-on-the-web, etc. Remotes
hosting it currently a gRPC golang service that implements a protocol
which allows a client to discover what the contents of the remote
currently are and to fetch portions of the remote which are necessary
for synchronizing state or computing the result of some query. Remotes
themselves are stored in in a combination of S3 and DynamoDB, and the
remotes API makes S3 data directly accessible to authorized clients
using short-lived signed URLs.
Further Details and Glue
We use envoy for ingress and as a network sidecar.
It is the first hop (after an
ELB) for all inbound traffic
and is responsible for all ingress and egress traffic on all of our first-party
services. It provides a common place to implement uniform metrics, logging,
tracing and policy enforcement.
We use terraform extensively. Almost all of our
cloud resources are provisioned and managed through terraform projects which
live in our source repository.
We currently use SQS for delivering background work and Kubernetes
CronJobs for scheduled tasks. We use
and CloudWatch for metrics,
graphing and logging.
Testing and common deployment tasks are generally implemented as
workflows in Github Actions, but infrastructure provisioning and rarer
deployment tasks remain the responsibility of a small set of
designated operators for now.
In general, we're pretty happy with our current infrastructure
stance. It fits our current scale and product roadmap decently well
and we're able to iterate on existing functionality and add new
functionality quickly. There are always improvements to be made, and
we will continue to make changes going forward, but this is where we
are almost three years into our journey at DoltHub.