Announcing Merge on DoltHub

5 min read

Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. Our goal is to make it easier to collaborate on data, and pull requests, or proposed changes to a repository, are an essential part of this process. Before today, merging these changes from one branch into another could only be done through the command line using Dolt. Today we released a merge button that enables merging directly from DoltHub.

What does merging mean?

The unique selling point of Dolt as a database is its Git-like functionality, which includes clone, fork, push, pull, branch, and merge. According to Git, merging "incorporates changes from the named commits (since the time their histories diverged from the current branch) into the current branch". This is a very powerful tool for collaboration. A user can open a pull request with suggested changes to a dataset, and once approved those changes can be merged into the desired branch.

Before web merge, clicking on the Merge button in a pull request provided the following directions for how to merge two branches using Dolt (directions for merging a pull with cross-repository changes are a little different):

dolt clone dolthub/my-repo
cd my-repo
dolt checkout master
dolt merge feature-branch

If there are conflicts from the merge you can follow the directions to resolve them and then push your changes to DoltHub using dolt push origin master. You'll see the pull request's status is automatically updated to "merged" and your changes will show in the branch you're making changes to.

As of today you can do all those steps with the click of a button!

How does merge on DoltHub work?

On Monday, Aaron wrote about data collaboration on DoltHub. He shows how to use forks and pull requests to collaborate on making some changes to our US Supreme Court cases repository.

Picking up where he left off, we changed the pull request page a little bit so now Aaron's pull request with some updates to the justices table looks like this (if you couldn't tell, that green button is where the magic happens):

Pull Request

Once he receives approval from the repository owner (in this case Tim), Aaron can merge in his changes using the new merge button (but as you can see from the commit log below I beat him to the punch 😇). If there are conflicts, you'll be notified which tables conflict and can get instructions for how to resolve them locally using Dolt. Resolving conflicts on the web is on our longer-term roadmap.

I can tell from the commit log after merging that the commits from Aaron's pull request, as well as a merge commit, are now in the master branch:

Supreme court commit log

Depending on whether a fast-forward merge is possible, there may not always be a merge commit when you merge. Read about the difference between fast-forward merge and true merge here.

Our roadmap for DoltHub

One of the ways we're making data collaboration using Dolt and DoltHub easier is by reducing the friction for making changes to data, especially for users who may not be as familiar with Git or SQL. We call this larger project "edit on the web". First came forks, where you can clone a dataset to your own domain, and today we released web merge on DoltHub, which is the first feature where we're actually writing to Dolt from DoltHub.

We're very excited about the road ahead and here's a sneak peek of what's coming up next:

Adding and committing changes to a new branch using SQL

Edit: implemented 3/8/21

On September 18, 2020 Justice Ruth Bader Ginsburg passed away. Tim wanted to update this unfortunate news in our US Supreme Court repository. To do so, he had to clone the repository, create a branch, run a SQL query to update the appropriate cells for RBG's row in the justices table, add the table, commit the changes, push his branch, and open a pull request. Wouldn't it be awesome if he simply could've run his update query in the DoltHub SQL Console and add/commit/create a branch/open a pull request directly from DoltHub?

Uploading a CSV and generating a commit on a branch

Edit: implemented 11/13/2020

Don't know SQL? No problem. Soon you'll be able to upload data from any CSV without needing to know Git or SQL. For example, take any Kaggle dataset (check out the hottest ones here) and share it on DoltHub so you and others can see the diffs when it changes. You'll get the added bonus of having the option to learn a bit of SQL through our SQL Console or clicking on table cells to generate filter or sort queries.

Editing like a spreadsheet

Google Sheets are a popular tool for collaboration for a reason. You can directly make an edit and see who made changes when. Instead of updating a cell through a SQL query, editing your table like a spreadsheet would allow users who are less familiar with SQL or users wanting to make a quick change to their data to update a cell directly on DoltHub.

Please contact us or reach out to us on Discord if there's a feature you want sooner rather than later. We're always happy to prioritize features for our users.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt