An Overview of DoltHub Infrastructure

July 23, 2021

5 min read

DoltHub is our web application for working with, sharing and collaborating on Dolt databases. We host dolt remotes and run bounties on DoltHub, among other things. Sometimes when talking with customers or candidates, we get questions about how infrastructure at DoltHub is currently structured and generally how things work. This blog post is a point-in-time snapshot of generally how we’ve built things out so far.

Overview#

The major components of DoltHub are currently:

The frontend, consisting of HTML, Javascript, CSS and other assets delivered to a user’s browser.
A GraphQL API for the frontend, with queries and mutations to load and change dynamic data on DoltHub.
Backend service APIs, which maintain metadata and data on DoltHub, implement business rules, and provide a common interface for GraphQL resolver and mutation implementations.
Dolt remotes hosting, consisting of an API which Dolt clients can resolve databases against, and database storage.

In addition, we have plenty of smaller adhoc stuff going on, but this blog post will focus on these major components for now.

In general, our infrastructure is deployed within AWS. We love the flexibility, incremental pricing and very broad product offerings that come with the major public cloud providers. Much of what we build is intentionally cloud agnostic, which we think is important for strategic and operational cost reasons, but we sometimes take vendor coupling where differentiation and operational costs shift the balance.

We will get into specifics shortly, but our first-party code invariably runs within AWS EKS clusters. For durable storage, we generally make use of RDS, DynamoDB and S3.

How we use EKS#

EKS is AWS’s hosted kubernetes offering, and we’ve been using it since nearly the very beginning of DoltHub. We deploy all of our first party services into EKS clusters, and we use a GitOps like workflow where every logically deployable kubernetes artifact is represented by a kustomize manifest within our source repository. We build Docker images with BuildKit and Bazel, and we push them to ECR before updating the deployed artifacts manifests in the repository and applyint the manifest changes to our clusters. We will sometimes manually rollback with kubectl rollout undo, or scale up with kubectl scale, but shortly after any manual changes to the cluster we update the manifests and image artifacts hashes in the repository and go through a deployment.

We chose Kubernetes as a cloud agnostic control plane and API for deploying first- and third-party applications in a standardized way. It has a ton of community support behind it and expertise and support are easy to find. It comes with a steep initial learning curve, but hosted offerings lower the operational costs substantially and the general abstractions and constraints it presents to application developers helps to keep the software we’re deploying on it decoupled and following best practices.

DoltHub Frontend#

DoltHub itself is built as a Next.js application. The functionality is almost entirely client-side, and it does not currently make much use server-side rendering, although we would like to in the future. We do currently implement some server-side code for things like feature gates and oauth access token exchange, but it’s quite minimal.

The application itself deploys as a Node.js process in a docker container running in an EKS cluster.

DoltHub GraphQL API#

DoltHub serves web assets to a user’s browser and those assets fetch and mutate state for the application by calling our GraphQL API. The GraphQL layer is built in Nest.js. It’s deployed as a Node.js process running in a docker container in our EKS cluster.

The GraphQL application consists of models and resolvers. The models get compiled into a client library that can be used by the DoltHub frontend, while the resolvers run server-side and compute the results of queries and mutations that the frontend makes. Generally the resolvers call out to other APIs that are running in our cluster.

Backend APIs#

There are a few backend APIs that the GraphQL API layer talks to, but the most important one by far is called DoltHub API. It implements a number of logical services supporting all aspects of DoltHub’s data model, including user accounts, organizations, repositories, pull requests, bounties, sessions, etc. All of our internal APIs are gRPC APIs written in golang, and they all deploy as deployments and services in our EKS clusters. We’ve been happy with gRPC as a schema’d RPC framework with a mature ecosystem around it. In general, when we are developing features for DoltHub, we want to absolve individual developers from having to make accidental technical decisions. Having a widely supported framework that works for a wide variety of use cases, a common language for how to structure APIs, and nice generated code for clients and servers in multiple languages is a nice tailwind for getting things done. For similar reasons, DoltHub API’s primary durable storage backend is an RDS postgres instance.

Remotes Hosting#

Remotes hosting is responsible for hosting DoltHub databases, and has clients of both the Dolt CLI, in operations like clone, fetch, push and pull, and of DoltHub, where it surfaces database data for things like pull requests, diffs, query-on-the-web, etc. Remotes hosting it currently a gRPC golang service that implements a protocol which allows a client to discover what the contents of the remote currently are and to fetch portions of the remote which are necessary for synchronizing state or computing the result of some query. Remotes themselves are stored in in a combination of S3 and DynamoDB, and the remotes API makes S3 data directly accessible to authorized clients using short-lived signed URLs.

Further Details and Glue#

We use envoy for ingress and as a network sidecar. It is the first hop (after an ELB) for all inbound traffic and is responsible for all ingress and egress traffic on all of our first-party services. It provides a common place to implement uniform metrics, logging, tracing and policy enforcement.

We use terraform extensively. Almost all of our cloud resources are provisioned and managed through terraform projects which live in our source repository.

We currently use SQS for delivering background work and Kubernetes CronJobs for scheduled tasks. We use prometheus, grafana and CloudWatch for metrics, graphing and logging.

Testing and common deployment tasks are generally implemented as workflows in Github Actions, but infrastructure provisioning and rarer deployment tasks remain the responsibility of a small set of designated operators for now.

Conclusion#

In general, we’re pretty happy with our current infrastructure stance. It fits our current scale and product roadmap decently well and we’re able to iterate on existing functionality and add new functionality quickly. There are always improvements to be made, and we will continue to make changes going forward, but this is where we are almost three years into our journey at DoltHub.

Blog

PRODUCTS

KEYWORDS