Deploying a Production Ready DoltLab Instance, An Example

DOLTLAB
12 min read

This year we launched DoltLab the self-hosted version of DoltHub. In February, we released the latest version of DoltLab, version v0.2.0, which included a number of features and bug fixes. We are actively working on DoltLab's next release which is focused on improving the DoltLab administrator's experience as well as making it a bit easier to submit bug reports and service logs to our team.

As part of our work to make DoltLab a high-quality product, we've recently launched an internal DoltLab instance we use as a staging environment for upcoming DoltLab releases. We've set up this instance to model what DoltLab administrators should do to more easily deploy their own DoltLab instances.

Today I'll be demonstrating how we deployed a DoltLab instance to this staging environment in a way that allows you to also deploy a DoltLab instance to a production ready environment.

We will be deploying our own production DoltLab instance in the near future. Our production instance will be used largely for enterprise client demonstrations, but will be available to the public for viewing, querying, and cloning databases.

Let's get started!

TL;DR Deploying a Production Ready DoltLab Instance

  1. Build an AMI
  2. Create a Launch Template
  3. Start DoltLab

Build an AMI

To aid us in sanely deploying any version of DoltLab to our internal staging environment, we opted to create a custom AMI that installs and configures the required dependencies the host machine needs in to deploy DoltLab shortly after booting.

We also use Packer and Terraform to create the AMI and provision the necessary AWS resources. Please note, a cloud provider is not required to use DoltLab. Internally, we've opted to deploy DoltLab on AWS EC2 since AWS is DoltHub's current cloud provider.

To build an AMI with Packer, we created a file called doltlab_ami.pkr.hcl that includes the following:

packer {
  required_plugins {
    amazon = {
      version = ">= 1.0.4"
      source  = "github.com/hashicorp/amazon"
    }
  }
}

variable "sha" {
  type = string
}
variable "stage" {
  type = string
}

source "amazon-ebs" "ubuntu" {
  ami_name      = "doltlab-${var.stage}-${var.sha}"
  instance_type = "m5a.xlarge"
  source_ami_filter {
    filters = {
      name                = "ubuntu/images/*ubuntu-focal-20.04-amd64-server-*"
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["XXXXXXXXXX"]
  }
  ssh_username = "ubuntu"
}

build {
  name    = "doltlab"
  source "source.amazon-ebs.ubuntu" {
    region        = "us-east-1"
    assume_role {
      role_arn = "arn:aws:iam::XXXXXXXXXXXXXX:role/DoltLabAMIBuilder"
    }
  }

  provisioner "file" {
    source = "authorized_keys"
    destination = "/home/ubuntu/.ssh/authorized_keys"
  }

  provisioner "file" {
    source = "ubuntu-bootstrap.sh"
    destination = "/home/ubuntu/ubuntu-bootstrap.sh"
  }

  provisioner "file" {
    source = "openssl.conf"
    destination = "/home/ubuntu/openssl.conf"
  }

  # create self-signed tls cert for aws-smtp-relay
  provisioner "shell" {
    inline = [
      "openssl req -new -x509 -config /home/ubuntu/openssl.conf -days 24855 -out /home/ubuntu/aws-smtp-relay.crt -keyout /home/ubuntu/aws-smtp-relay.key",
      "sudo cp /home/ubuntu/aws-smtp-relay.crt /usr/local/share/ca-certificates/",
      "sudo update-ca-certificates",
    ]
  }

  # install aws-smtp-relay
  provisioner "shell" {
    inline = [
      "curl -LO https://go.dev/dl/go1.17.7.linux-amd64.tar.gz",
      "sudo rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.17.7.linux-amd64.tar.gz",
      "sudo /usr/local/go/bin/go install github.com/blueimp/aws-smtp-relay@v1.1.0",
    ]
  }

  # create aws-smtp-relay service definition `aws-smtp-relayd`
  provisioner "file" {
    source = "aws-smtp-relayd.service"
    destination = "/tmp/aws-smtp-relayd.service"
  }

  # create aws-smtp-relay service `aws-smtp-relayd` and enable it to start on boot
  provisioner "shell" {
    inline = [
      "sudo chmod 664 /tmp/aws-smtp-relayd.service",
      "sudo chown root:root /tmp/aws-smtp-relayd.service",
      "sudo mv /tmp/aws-smtp-relayd.service /etc/systemd/system/aws-smtp-relayd.service",
      "sudo systemctl daemon-reload",
      "sudo systemctl enable aws-smtp-relayd",
    ]
  }

  provisioner "shell" {
    inline = [
      "sudo apt install unzip",
      "curl \"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip\" -o \"awscliv2.zip\"",
      "unzip awscliv2.zip",
      "sudo ./aws/install",
    ]
  }

post-processor "manifest" {
      output = "doltlab_ami_manifest.json"
      strip_path = true
  }
}

At the start of the file, after declaring the amazon plugin in the packer block, we define two variables whose values will be defined at build time–sha and stage.

We use these variables to differentiate our AMIs based on the commit sha of our DoltLab source code and the stage, or context, this AMI is used in. To build a development AMI we'd set stage=dev, and for production we'd set stage=prod. We can then build the AMI by running:

sha=`git rev-parse HEAD`
packer build -var sha="$sha" -var stage="$stage" ./doltlab_ami.pkr.hcl

The source block in our file defines the host machine and operating system we'll use as the base image for building DoltLab's AMI. Currently we use an m5a.xlarge instance with 4 vCPU and 16GB of memory. This is the same instance type we'll use to run DoltLab. This host will run ubuntu 20.04 as DoltLab v0.2.0 is currently only available for linux.

The build block specifies the IAM role, DoltLabAMIBuilder, who has permission to build this AMI then proceeds to define a series of provisioner blocks that do the heavy lifting in this packer configuration. Let's look at what each of these provisioner blocks contain.

The first provisioner block copies a local file authorized_keys containing authorized ssh keys to /home/ubuntu/.ssh/authorized_keys in our DoltLab host. This allows all authorized DoltLab developers and administrators ssh access to the host.

The second provisioner block copies a local file ubuntu-bootstrap.sh to /home/ubuntu/ubuntu-bootstrap.sh on the DoltLab host. This script, originally created for this video blog, and available here, makes installing DoltLab's dependencies a very simple one-line command, so we just add it our host.

The third provisioner block copies a local file openssl.conf to /home/ubuntu/openssl.conf and is referenced in a subsequent provisioner block to generate a self signed TLS certificate used for connecting to an SMTP relay server we'll run next to DoltLab. We will look more closely at that a bit later, though. For now, here is the contents of our openssl.conf file:

[req]
default_bits=2048
default_md=sha256
default_keyfile=aws-smtp-relay.key
encrypt_key=no
prompt=no
distinguished_name=distinguished_name
x509_extensions=x509_ext

[x509_ext]
subjectKeyIdentifier=hash
authorityKeyIdentifier=keyid,issuer
basicConstraints=CA:FALSE
nsCertType=server
keyUsage=digitalSignature,keyEncipherment
extendedKeyUsage=serverAuth
subjectAltName=@alt_names

[distinguished_name]
commonName=localhost

[alt_names]
IP=<AWS EIP>
DNS=<A Record DNS name>

Notice that the end of the file contains two key value pairs, IP=<AWS EIP> and DNS=<A Record DNS name>. For our internal DoltLab deployment, as with all production deployments, we want our DoltLab AMI to run on a host that resolves to the same IP address and DNS name every time we launch it.

So, we provisioned an AWS EIP and an DNS A Record mapped to the EIP. For the SMTP relay to recognize the host's connection, we add the EIP as the value to IP, and the DNS A Record name as the value of DNS.

The next provisioner block is the first of the shell blocks in our doltlab_ami.pkr.hcl file. This block actually generates the aforementioned TLS certificate using openssl and adds the certificate to the host's certificates store.

The provisioner block after that is also a shell block, and is responsible for installing golang on the host, which we use immediately to install the AWS SES SMTP relay our DoltLab instance will use to send emails.

Setting up an SMTP relay allows DoltLab to send emails through our existing AWS SES account using IAM roles associated with the host machine. We opted to relay on the host's IAM roles for sending emails so that we do not need to pass in secret values to the EMAIL_USERNAME and EMAIL_PASSWORD environment variables required by the DoltLab's start script. This provides an additional layer of security for production environments.

The next provisioner block is another file block that copies a local file aws-smtp-relayd.service to /tmp/aws-smtp-relayd.service. This file is a simple systemctl service definition for the aws-smtp-relay server we installed above. Here's what aws-smtp-relayd.service contains:

[Unit]
Description=aws-smtp-relay service

[Service]
Environment="AWS_REGION=<Region>"
ExecStart=/root/go/bin/aws-smtp-relay -c /home/ubuntu/aws-smtp-relay.crt -k /home/ubuntu/aws-smtp-relay.key -s

[Install]
WantedBy=multi-user.target

This service definition enables us to start the SMTP relay server as a daemon process managed by systemctl. In the next provisioner block, we register this service with systemctl, calling it aws-smtp-relayd, and enable the the server process to start when the host boots using sudo systemctl enable aws-smtp-relayd.

Finally, our last provisioner block is another shell block that installs the aws CLI tool on the host. We will use this tool in the EC2 launch template to reassign the newly launched EC2 instance's IP to be the stable AWS EIP we provisioned earlier. This will ensure any instance deployed with this AMI can assign itself to our EIP.

Create a Launch Template

After building the AMI with Packer, we created a launch template for our DoltLab deployments that enable launching new EC2 instances using our AMI. Here's what our Terraform file declaring this launch template looks like:

locals {
  doltlab-user-data = <<USERDATA
#!/bin/bash

# capture instance id
InstanceID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id)

# capture eip allocation id
AllocateID=$(aws ec2 describe-tags --filters "Name=tag:Name,Values=dev-eip" --query "Tags[0].ResourceId" --output text)

# Assigning Elastic IP to Instance
aws ec2 associate-address --instance-id $InstanceID --allocation-id $AllocateID

USERDATA
}

resource "aws_launch_template" "doltlab" {
  name          = "doltlab"
  image_id      = local.packer_ami_id
  instance_type = "m5a.xlarge"
  iam_instance_profile {
    name = aws_iam_instance_profile.doltlab-instance-profile.name
  }
  network_interfaces {
    associate_public_ip_address = true
    subnet_id = aws_subnet.subnet.id
    security_groups = [
      aws_security_group.security_group.id
    ]
  }
  tags = {
    Name = "doltlab-dev-instance"
  }
  block_device_mappings {
    device_name = "/dev/sda1"
    ebs {
      volume_size = 2048
      delete_on_termination = true
    }
  }
  metadata_options {
    http_endpoint = "enabled"
    instance_metadata_tags = "enabled"
  }
  update_default_version = true
  user_data = base64encode(local.doltlab-user-data)
}

Skipping the USERDATA block for now and looking at the aws_launch_template definition reveals that this launch template will deploy an m5a.xlarge EC2 instance type with local.packer_ami_id set to the value of the AMI ID we created with Packer.

The network_interfaces block contains our VPC's subnet ID and the security group to attach to the launched instances.

Configuring the host's security groups for DoltLab is a very important step, since very specific ports must be open to run DoltLab successfully. Here is our security group configuration:

resource "aws_security_group" "security_group" {
  name        = "doltlab-dev"
  description = "Security group for doltlab development"
  vpc_id      = aws_vpc.vpc.id
}

resource "aws_security_group_rule" "egress" {
  description       = "Allow doltlab instances to egress"
  type              = "egress"
  protocol          = "-1"
  from_port         = 0
  to_port           = 0
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "ssh" {
  description       = "Allow doltlab instances to ingress ssh"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 22
  to_port           = 22
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "http" {
  description       = "Allow http connections"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 80
  to_port           = 80
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "doltlab_remote_data_server" {
  description       = "Allow connections to remote file server"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 100
  to_port           = 100
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "doltlab_remote_api" {
  description       = "Allow connections to doltlab remote api"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 50051
  to_port           = 50051
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "doltlab_file_service_api" {
  description       = "Allow connections to doltlab file service api"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 4321
  to_port           = 4321
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

resource "aws_security_group_rule" "doltlab_aws_smtp_relay" {
  description       = "Allow connections to aws-smtp-relay"
  type              = "ingress"
  protocol          = "tcp"
  from_port         = 1025
  to_port           = 1025
  security_group_id = aws_security_group.security_group.id
  cidr_blocks       = ["0.0.0.0/0"]
}

Most of these ports should look familiar if you've seen our DoltLab v0.2.0 setup blog or our DoltLab video blog. But since we are running an SMTP relay in this context, we also open port 1025.

Getting back to the aws_launch_template declaration, we can see that block_device_mappings block provisions 2TBs of EBS disk for use on our host, however we don't plan on pushing or uploading extremely large amounts of data to our DoltLab staging site (the completed FBI-NIBRS database on DoltHub is 1TB itself!). If this changes, we would definitely increase the amount of disk we provision here to support our use case.

Finally, we enable the metadata_options for the host so that we can retrieve some important information from EC2's metadata service, and include the USERDATA we defined at the top of the file in the appropriate field. This USERDATA block will run when the host boots and it's what allows the host to dynamically map it's own IP to our provisioned EIP.

Looking closely at the USERDATA now, we can see the steps that allow this IP reassignment.

First, we fetch the InstanceID of the newly launched instance by curling the EC2 metadata endpoint:

InstanceID=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id)

Next, we fetch the AllocationID associated with the EIP we provisioned using the aws CLI tool we provisioned in our AMI:

AllocateID=$(aws ec2 describe-tags --filters "Name=tag:Name,Values=dev-eip" --query "Tags[0].ResourceId" --output text)

Then we just need to use associate-address to map this new host to our EIP:

aws ec2 associate-address --instance-id $InstanceID --allocation-id $AllocateID

Now, when launching an instance from this launch template, you can see the IP remapping happen in the AWS console shortly after launch, it's kinda cool!

Start DoltLab

The final remaining step is to ssh into the newly launched host and start DoltLab. To do this, first run the script that installs DoltLab's dependencies and downloads the version of DoltLab you want to run (which should be v0.2.0 or higher):

chmod +x ubuntu-bootstrap.sh
sudo ./ubuntu-bootstrap.sh with-sudo v0.2.0

After the script finishes, the host will have an unzipped directory called doltlab that contains the resources needed to run DoltLab using the doltlab/start-doltlab.sh script. This script will start DoltLab's services using docker-compose in daemon mode.

There is one additional change we need to make in order to enable DoltLab to successfully connect to our aws-smtp-relayd process listening on 1025.

We need to modify the doltlab/docker-compose.yaml. Under the doltlabapi section, in the volumes definition, we mount the certificates of the host (which is authorized to connect to the SMTP relay server via TLS) to the container running doltlabapi. This file should be mounted to the same path:

...
doltlabapi:
  ...
  volumes:
    ...
    /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt
    ...
...

Once we've updated our docker-compose.yaml file, we need to change groups in our shell so we can run docker without using sudo:

sudo newgrp docker

Now we can run the doltlab/start-doltlab.sh script with the proper environment variables. Note that EMAIL_USERNAME and EMAIL_PASSWORD are required by the script, but can be set to nonsense values as the host's IAM roles will be used to authenticate emails sent by DoltLab:

HOST_IP=<Host IP or DNS Name> \
POSTGRES_PASSWORD=<Password> \
DOLTHUBAPI_PASSWORD=<Password> \
POSTGRES_USER=dolthubadmin \
EMAIL_USERNAME=not-used \
EMAIL_PASSWORD=not-used \
EMAIL_PORT=1025 EMAIL_HOST=<Host IP or DNS Name> \
NO_REPLY_EMAIL=<An Email Address to Receive No Reply Messages> \
./start-doltlab.sh

HOST_IP will contain the DNS A Record we provisioned for our DoltLab instance, and we also supply this value to EMAIL_HOST since our SMTP relay is also running on this host. POSTGRES_PASSWORD and DOLTHUBAPI_PASSWORD can be chosen by the deployer, but POSTGRES_USER must be dolthubadmin.

After the script completes, the running DoltLab services can be seen with docker ps:

docker ps
CONTAINER ID   IMAGE                                                             COMMAND                  CREATED      STATUS      PORTS                                                                                     NAMES
c1087c9f6004   public.ecr.aws/dolthub/doltlab/dolthub-server:v0.2.0              "docker-entrypoint.s…"   9 days ago   Up 9 days   3000/tcp                                                                                  doltlab_doltlabui_1
a63aade4a36e   public.ecr.aws/dolthub/doltlab/dolthubapi-graphql-server:v0.2.0   "docker-entrypoint.s…"   9 days ago   Up 9 days   9000/tcp                                                                                  doltlab_doltlabgraphql_1
5b2cad62d4e5   public.ecr.aws/dolthub/doltlab/dolthubapi-server:v0.2.0           "/app/go/services/do…"   9 days ago   Up 9 days                                                                                             doltlab_doltlabapi_1
e6268950f987   public.ecr.aws/dolthub/doltlab/doltremoteapi-server:v0.2.0        "/app/go/services/do…"   9 days ago   Up 9 days   0.0.0.0:100->100/tcp, :::100->100/tcp, 0.0.0.0:50051->50051/tcp, :::50051->50051/tcp      doltlab_doltlabremoteapi_1
52f39c016537   public.ecr.aws/dolthub/doltlab/fileserviceapi-server:v0.2.0       "/app/go/services/fi…"   9 days ago   Up 9 days                                                                                             doltlab_doltlabfileserviceapi_1
0f952e7c7007   envoyproxy/envoy-alpine:v1.18-latest                              "/docker-entrypoint.…"   9 days ago   Up 9 days   0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:4321->4321/tcp, :::4321->4321/tcp, 10000/tcp   doltlab_doltlabenvoy_1
204e0274798b   public.ecr.aws/dolthub/doltlab/postgres-server:v0.2.0             "docker-entrypoint.s…"   9 days ago   Up 9 days   5432/tcp                                                                                  doltlab_doltlabdb_1

Conclusion

If you're using DoltLab, or want to start, please don't hesitate to contact us here or on Discord in the #doltlab channel. We are happy to help you out and make sure that DoltLab delivers great value to support your use case.

Stay tuned for more DoltLab updates headed your way soon!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.