Dolt is a MySQL compatible database with Git like features. On May
18th, we launched Hosted Dolt, a cloud-hosted Dolt database
with built-in logs and monitoring. If you're not familiar with Hosted Dolt, here are some
blogs to get started:
We recently added a support ticket system to the Hosted Dolt website. Users can create
support tickets of varying urgency if something goes wrong and a member of the DoltHub
team will be dispatched to resolve the issue.
This blog will cover how we use and set up AWS Incident
Manager to manage our
What is Hosted Dolt?
Hosted Dolt is our solution for AWS RDS or MariaDB
SkySQL and is modeled after AWS Management
Console or GCP
You use the administrative website to provision and manage a running
Dolt database. Once you provision the database you connect to it
with any MySQL client over the internet. When you're done with the database, deactivate it
and we'll stop charging you for it.
Because Hosted Dolt is used for running version controlled databases in production, uptime
is extremely important. We have a team of veteran cloud service engineers to keep your
database operating smoothly. However, on the off chance something goes wrong, we need an
escalation plan in place to address and resolve urgent issues as soon as possible.
When we started to look into tools that could help us implement our support ticket system,
we had two main criteria:
- Hosted Dolt users can create and manage tickets on our website
- Our team can easily manage tickets through an admin console
We considered a few options that hit these criteria - PagerDuty, Splunk
AWS Incident Manager,
and implementing the system ourselves. Ideally if there were existing tools we could use
to implement this system, we'd prefer to not spend the time and resources building it from
Here's a quick comparison of these three tools.
- Initial Release
- August 2009
- Free for up to 5 users/month, $21 per user/month after
- - Automated precision response
- - Business-wide orchestration
- - Major incident learning
- Initial Release
- December 2012, Acquired June 2018
- Starts at $5 per user per month
- - iOS and Android apps
- - Incident context and audit trail
- - Rules engine
- - Machine learning-based responder recommendations
- Initial Release
- May 2021
- $7 per response plan per month
- - Automatically collect and track the metrics
- - Collaborate through contacts, escalation plans, and chat channels
- - Automate repeatable steps to resolve incident
Some of our team members had used VictorOps at their former jobs at Snapchat. While it was
praised for its nice web interface and ease of on-call scheduling, there was some
additional integration work and maintenance required by having monitoring in a separate
We ultimately decided to go with AWS Incident Manager. We use Amazon
CloudWatch for metrics and logs for Hosted Dolt
instances, and it easily integrates with AWS Incident Manager to alert and alarm off of
metrics. Having a single place to monitor tickets and metrics was appealing from an
How our support ticket system works
A support ticket consists of an impact level, title, summary, and affected deployments. We
started with two impact levels -
Critical (production system down, response within an
Low (general guidance, response within 24 hours). You can choose zero to many
related deployments (you must have an active deployment to create a critical ticket) and
add relevant details in markdown. Our on-call team member will be paged and respond as
This is our support ticket workflow:
1. Oh no! Something goes wrong with a deployment. The user creates a support ticket.
You can create support tickets either from the support tab on the deployment management
console or from hosted.doltdb.com/support.
2. Our on-call team member gets notified via email or phone.
3. The user realizes they should include more information and edits the ticket.
4. Our paged team member fixes the issue and uses the AWS Incident Manager console to edit and resolve the ticket.
5. The ticket gets updated on the Hosted Dolt website so the user can see the resolution.
Setting up the support ticket system
There are a few steps we needed to take to use AWS Incident Manager to create a support
ticket system on our website.
1. Create a response plan
First, we need to create a response plan in AWS Incident Manager. This lets us define who
is notified and how we respond when an incident occurs. You can refer to the AWS
for how to create a response plan.
Ours is pretty simple and looks something like this:
Take note of the ARN, as we will need this when setting up the API.
2. Using the AWS Systems Manager Incident Manager SDK
As mentioned in the Hosted Dolt Infrastructure blog,
our Hosted Dolt API is a Golang service providing GRPC endpoints. Luckily, Amazon has an
AWS SDK for Go and we can use the
package, which provides the API Client, operations, and parameter types for AWS Systems
Manager Incident Manager.
We set up an incident service within our Hosted Dolt API that creates a client from the
incoming config, and then takes incoming ticket, deployment, and user information and
converts them to resources that can be used by the operations provided by the
ssmincidents package. Specifically we use:
Starts an incident using the response plan (that you created above) ARN, title, and related items (deployment
urls and creator username)
Get incident information
Update incident information, such as title, summary, and impact level
Lists related items, in our case related deployment urls
Add or remove related items, in our case related deployment urls
3. Using the incident service in Hosted Dolt API
Once our incident service was set up, we created some GRPC endpoints that can be used by
our front end to view and manage tickets on our website -
UpdateIncident. You'll notice these don't necessary map to the
incident manager operations we use above. There were some extra steps we needed to take to
work with the AWS SDK in order to curate the user experience we wanted on our website.
The AWS operation
ListIncidentRecord can return records filtered by the
field. Since our Hosted Dolt users don't map to AWS users, we couldn't use this operation
to list tickets for a user on our website. To get around this, we store some incident
information, including the hosted creator ID, in the database that backs Hosted Dolt
(which happens to be Dolt) when an incident is started, which allows us to list incidents
for a user. The downside of this is that some information, like incident state (i.e. if
the ticket is open or resolved), isn't necessarily updated in our database when updates
are made from the AWS console. We can call the
GetIncident operation for every listed
item to get around this.
The input for the AWS operation
StartIncident doesn't include a summary, only a response
plan ARN, title, impact level, and related items. It made more sense to us that the user
provides a summary when they create the ticket so that the paged team member doesn't have
to wait around for important information that could be crucial for getting a fix out. So
UpdateIncidentRecord to add the summary,
and then stores the record information in our Dolt database.
When a user is updating a ticket, we also didn't want them to have to separately update
ticket information and related deployments. So
UpdateIncident handles both the
UpdateRelatedItems AWS operations. Similarly,
4. Set up the front end
Once our API endpoints are ready to go, we need to create a UI to display, view, and
update support tickets. We created a support page and plugged in React components for a
support ticket form (used by both create and update), a ticket list, and a detailed ticket
view. Create an account and deploy a
database to check it out!
As Hosted Dolt grows and more people use our support ticket system, we can add more
features that can further help us support our users. Some of these include:
- Notifying users when a ticket is updated or resolved by a team member
- Option to open chat channel for more instantaneous communication
- Add more impact types (AWS also has High/partial failure, Medium/reduced service, and No impact options)
- Create a post-incident analysis to improve our response in the future
- List support tickets by deployment so organization members, not just ticket creators, can view open and resolved tickets
If you're curious about Hosted Dolt or want to chat about incident management, join us on
Discord or file an issue on
GitHub. We have some exciting
features in the works for Hosted Dolt, including
backups and a database
UI similar to
DoltHub for your hosted instance.