Dolt roadmap retrospective

11 min read

A few months ago I got interviewed by a substack on open source development about Dolt. One of the questions I was glad to answer was what motivates me to contribute to Dolt. Well there's the money, obviously. But more than that, it's being part of a culture that ships.

substack interview

At DoltHub, we ship major new features every month, and a stream of smaller improvements and bugfixes on a continual basis. We ship so often, in fact, that it can be hard to keep track of what we've accomplished and what we're going to do next. This blog post is an attempt to pause and take stock of how far the product has come in the last couple years, and reflect on where we're going next.

Product planning at DoltHub

When we interview candidates, there's always a period when we flip the process around and let them ask us questions instead. One of the most frequent questions we get from experienced candidates is: what sort of project planning process do you use? Because we've been in the industry for decades, we recognize this question for what it is: a desperate plea to not have to use Jira or play planning poker ever again.

jira hell

I tell them it's pretty simple. We have a big spreadsheet that we use to keep track of planned feature work, and every month or so we update it to reflect progress and priorities. It looks like this:

priority spreadsheet

We add new features there when we decide to work on them, and move rows up and down to indicate their relative priorities. For time-sensitive work, we'll sometimes add a deadline to a row. But that's basically it, very little process. The main point of the spreadsheet is to enable us to have discussions about staffing tradeoffs and customer expectations.

  • If we deliver stored procedures in October, what else has to slip?
  • If Andy works on rewriting the storage layer, who will take over keyless table storage?

We also end up doing a lot of unplanned work, typically requested by a paying customer. They come to us with a query that's too slow or doesn't work correctly, and we give their issue top priority until we have a workaround. Sometimes those issues will get logged on GitHub, but often they never leave our customer chat rooms on Discord. And that's fine. The point of the roadmap isn't to perfectly capture all the work we do (release notes capture that pretty well), it's to help us plan.

Victory lap

At DoltHub, we ship a lot. We announce new features on this blog as they come out, but I want to take a moment to put all these accomplishments in one place, so that you (and we) can appreciate just how many there are. Warning: this is a long list!

Feature Release date
Change column type Feb 2021
Primary key changes Jan 2021
Indexes for keyless tables Aug 2021
parser allows reserved words as column names without backticks May 2020
explain statements show index usage Aug 2020
Outer scope accessible in subqueries Aug 2020
DESCRIBE TABLE, etc support for indexes and foreign keys Sep 2020
sqllogictest 99% Mar 2021
all information_schema tables present Nov 2020
Schema alteration on keyless tables Aug 2021
Column defaults Sept 2020
Triggers Oct 2020
Dolt CLI functions for SQL Feb 2021
Auto increment Nov 2020
Prepared statements Nov 2020
SIGNAL statement Mar 2021
Stored procedures Mar 2021
Keyless tables Dec 2020
Common table expressions (WITH) Mar 2021
Tuples for IN expression (multi-column IN) Aug 2021
Window functions (OVER) Feb 2021
dolt_commit_ancestors system tables Mar 2021
dolt_push() and dolt_pull() functions Sep 2021
Concurrency and transactions May 2021
INSERT...ON DUPLICATE Mar 2020
JSON type support Apr 2020
CHECK constraints Apr 2021
Foreign Keys Jul 2020
TRUNCATE table June 2020
Metaflow support Apr 2021
CREATE TABLE SELECT Aug 2020
Hash IN clause evaluation Jan 2021
N-table joins Dec 2020
Seconday indexes May 2020
Use more than one core Feb 2020
Push where clause down in join execution Oct 2020
Push projections to Dolt tables (return only a subset of columns) Mar 2020
Read from indexes, rather than full tables, when possible Mar 2020
MySQL Workbench support Aug 2021
Google Sheets support June 2021
Kedro Support June 2021
Great Expectations support June 2021
R Support Aug 2021
DataGrip support May 2020
Django support Aug 2021
Replication Sept 2021
Backup Sept 2021
Tags Sept 2020
Schema merge May 2020
Shallow pull, clone, fetch Feb 2020
filter-branch Nov 2020
Type conversion tests Apr 2020
dolt system tables Jan 2020
Detached HEAD SQL mode Mar 2021
Constraint violations July 2021
Check constraint violations command Mar 2021
Support for main default branch Sep 2021
LOAD_FILE() support Aug 2021
Generational garbage collection Aug 2021
Ecto and Elixir support July 2021
Performance benchmarking Oct 2020
DoltHub forks Sept 2020
Query diff June 2020
Serving multiple databases in a single server May 2020
AS OF support Mar 2020
Saved queries Feb 2020
2-table indexed joins Feb 2020
LICENSE and README files Feb 2020
Views Feb 2020
SQL queries on DoltHub Jan 2020
dolt blame Oct 2019

And there's a lot of stuff not even on this list, either because it got done without any fanfare or because it predates when we adopted even this limited planning process. DoltHub is a company that ships, a lot.

Today's roadmap

The product is a lot more mature today than a few years ago, as one would hope. In the earlier days there were so many missing features that prioritization was actually pretty easy: unless somebody was asking for a feature, we would be adding a lot of value no matter where we turned our attention, so strict prioritization didn't matter too much. I joked about this situation in an earlier blog post, but having such a huge surface area to cover was actually really fun and made planning pretty easy.

features

Today things are a little harder. Besides being an environment that is less target-rich than before, we have a growing number of paying customers and their use cases to support, and a larger pool who would adopt the product if it had some capabilities it doesn't yet. So it's more important now to think about what we're going to support next to make our existing customers happy and lure new ones.

This is always a work in progress, but here's our current top priorities for Dolt:

Feature ETA
Hosted Dolt Jan 2022
Join for update Oct 2021
Backup and replication Nov 2021
Commit graph performance Nov 2021
Collation and charset support Nov 2021
Persistent SQL configuration Dec 2021
Multiple DBs in one repo Dec 2021
Tx isolation levels Dec 2021
99.9% SQL correctness Q1 2022
Better dolt_diff table experience Q1 2022
Hash join strategy Q1 2022
Storage performance Q1 2022
SQL GUI support tests Q1 2022
Lock / unlock tables Q1 2022
Users / grants Q2 2022
JSON_TABLE() Q2 2022
Pipeline query processing Q2 2022
Table / index statistics Q2 2022
Universal SQL path for CLI Q2 2022
Row-level locking (select for update) Q2 2022
Virtual columns and json indexing Q2 2022
Embedded dolt Q3 2022
Signed commits Q3 2022

This list is mostly ordered by planned release date, which gets less certain as we get farther out. Our top priority, hosted Dolt, is a relatively large effort and a major launch, scheduled for year end. Most of the other items on the list are a lot smaller, but there are exceptions: storage performance is code for a near-total rewrite of the storage layer to make it performant for the SQL server, which is a monumental effort (good luck, Andy).

We expect to rearrange this list as time goes on, and for new items to emerge and jump the line. Paying customers (or prosepective paying customers) get write access to this roadmap, so if things go well this list will be obsolete in no time flat.

Conclusion

DoltHub ships, a lot. We're proud of the product features we've shipped so far, and eager to put more under our belt. If that sounds like an environment you'd like to be a part of, we're hiring!

Like the article? Interested in Dolt? Think we should be working on other things instead? Come join us on Discord to say hi and let us know what you think.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt