DoltHub
DoltHub
Dolt. It's Git for Data.

Introducing Dolt and DoltHub

Dolt and DoltHub let users collaborate on databases in the same way they collaborate on source code. Dolt is a relational database combined with the version control concepts of Git. DoltHub provides a central place to store and share data in Dolt format.

What does Dolt Do?

Dolt is Git for data. Git versions files. Dolt versions tables. Dolt looks a lot like Git all the way down to the help documentation. We didn't want a better version control interface. We just wanted a version-controlled database. Dolt can version any set of tables from a small Excel spreadsheet to a terabyte-scale database.

Dolt is also a relational database. Dolt stores data in tables with schemas. You interact with the database using SQL.

Dolt uses a custom database engine to provide fast operations and efficient storage for large tables. Dolt treats rows as unordered and understands columns, so diffs and merges are table cell-based instead of line-based, providing a more ergonomic data-versioning experience.

Dolt is open source. Find out how to get a copy here.

From Git:

  • Commits
  • Commit logs
  • Diffs
  • Branches
  • Merge
  • Conflicts
  • Remote repositories

From Relational Databases:

  • Tables
  • Schema
  • SQL

We also built DoltHub

DoltHub is GitHub for Dolt, a place on the internet to share Dolt repositories. DoltHub looks like a very slimmed-down GitHub. DoltHub aspires to be the best collection of data on the internet, distributed in the best format to collaborate on data: Dolt. Sign up and clone or create a repository, and start being part of the data community.

We seeded DoltHub with some interesting public Dolt repositories. Over time, the collection of data we host and maintain will grow. We intend to be very active participants in the open data community. Come to DoltHub for the data management tool or for access to interesting, maintained data.

Why version data?

Protect your data

The same way we protect our source code, we must also protect our data. Version control is the ultimate way to protect our data. Data is backed up and distributed across multiple physical machines. The origin and history of data is tracked and stored as well. Data is annotated on change. Never fear a delete or update query again.

Collaborate more efficiently

Do you ever want to find duplicate entries in two tables? Do you want to know when data changed and by whom? Do multiple people or teams want to adopt newly published data at a different pace? Do you import data from outside your organization and worry every import what will break? Do you want to be able to patch the data after import and not have those changes stomped the next time you get new data to import? Do you want to roll back data to a previous version?

Using a version controlled database like Dolt can help you with all those problems without having expensive human-to-human communication. Loose collaboration through branching and merging is the standard in software development because it solves collaboration problems in an efficient way. Bring that same tool to your databases.

Get better data, more quickly

Allowing the users of your data to contribute improvements to your data will allow you to get better data more quickly. Imagine distributing a database to hundreds of customers and having each customer contribute corrections and additions to your database. You can pick and choose which changes you merge into your master branch. This is the dynamic in open source and why it is such a popular software distribution mechanism. We think the same dynamic can exist for data.

Pricing

Dolt is open source, free to use, and free to distribute.

Signing up for DoltHub is free. DoltHub will host public Dolt repostories for free. Private repositories will eventually be hosted for a fee. We're still working out the pricing. Until then, private repositories will also be hosted for free.