Dolt + Google Sheets

REFERENCEINTEGRATION
11 min read

Dolt, DoltHub, and Hosted Dolt can be used for Spreadsheet Version Control. Many of our customers migrated to Dolt from a really complicated, heavily edited spreadsheet.

I first tested Dolt against Google Sheets in fall of 2020 working from home during the pandemic. I produced this documentation. Dolt has changed a lot since then. We launched an API on DoltHub to make SQL reads and writes. We also launched Hosted Dolt, a managed Dolt instance you connect to over the internet. Integration with Google Sheets has never been better. It's time for a documentation update.

Dolt + Google Sheets

Why Dolt and Google Sheets

There is collaborative editing and history in Google Sheets but as I explain in my Spreadsheet Version Control article, collaborative editing and history is not version control.

Version control allows for branch and merge, the ability to asynchronously collaborate on multiple versions in an organized way. Instead of everyone editing the same Google Sheet, you can create you own copy, make edits, see the differences (ie. diffs) between your copy and another copy, and then merge two copies together. That workflow is true version control. No spreadsheet has those features.

Moreover, by adding DoltHub or Hosted Dolt to the mix, you add a Pull Request workflow. Pull Requests allow for human review of edits. Human review helps ensure data quality of changes to your spreadsheets.

Goals

To show off how Dolt can work with Google sheets, I built a simple demonstration sheet. The demo shows off a basic read, edit on a branch, and pull request workflow as well as some advanced features like buttons. The basic and advanced goals are enumerated below.

Basic

  1. Read a Dolt table from the main branch into a Google Sheet
  2. Write a Dolt table from another Google Sheet to a branch
  3. Review differences in a Google Sheet
  4. Create a pull request on DoltHub or Hosted Dolt from branch to main
  5. Perform a human review of the pull request
  6. Merge the changes into main branch
  7. Read the new merged Dolt table into a Google Sheet

Advanced

These actually turn out to be not so advanced.

  1. Trigger the above functionality using Google Sheet Buttons
  2. Trigger the above functionality using Google Sheet formulas

How it works

Apps Script

Google Apps Script

Google Sheets exposes programmatic functionality via Apps Script, a Javascript-based language.

Apps Script supports Web APIs via the UrlFetchApp and JSON.parse functions. We use these functions to work with a DoltHub database via the DoltHub API.

Apps Script supports MySQL connectivity via JDBC. Dolt is MySQL-compatible. Hosted Dolt exposes a MySQL compatible server over the internet you can connect to with any MySQL client. We use the built-in Apps Script MySQL client to work with a Hosted Dolt database.

Both approaches are standard, out-of-the-box functionality for Apps Script. No special Dolt plug-ins are required.

Set Up

Dolt

For this demo, we created a very simple Dolt database. The database has one table named test. The test table has two columns, id and words. id is an integer and the primary key. Words is a string of max size 255 characters.

Additionally, we create two branches main and branch.

We create this database on both DoltHub and Hosted Dolt so we can demonstrate how to connect to each of those products using Google Sheets. We'll use these products to implement the Pull Request part of our goals.

DoltHub

DoltHub

You can find the DoltHub database named google_sheets here under my personal timsehn DoltHub account.

With DoltHub, you access Dolt via a web API. This has a few limitations. You can only run one write query at a time. Read query results are limited to 1000 rows. That being said, DoltHub feels a lot like GitHub. Non-technical users can manage it. So, if you have non-technical users using your version-controlled Google Sheet, DoltHub may be a better option for you.

To use the write web API I also needed to create a token using our new fine-grained API permissions system.

Hosted Dolt

Hosted Dolt

You can find the DoltHub database named google_sheets here. You will need to be granted permissions to see it. Either email me at tim@dolthub.com or come by our Discord and I'll give you access. Screenshots of the experience can be found below.

Hosted Dolt is a running Online Transaction Protocol (OLTP) database. You connect to it over the internet using any MySQL client. In general, queries to Hosted Dolt are going to return results faster. There is no limit to result set size. However, Hosted Dolt feels like a SQL database. Non-technical users may be a bit intimidated by the user interface.

The Google Sheet

The Google Sheet I created can be found here. It has global edit permissions. Don't delete it!

The Google sheet has three sheets:

  1. test-read: reflects the main branch of the database
  2. test-write: the content you would like to insert on a branch
  3. test-diff: the difference between what is in main and what is on branch

Depending on what other users have done, you'll see different content in each of the sheets.

Apps Script

In the sheet click on Extensions > Apps Script. This will bring up a new table with an Apps Script code editor named "Dolt Connector".

The Hosted Dolt connection code is in HostedDolt.gs. There are three public functions and one helper private function. Private functions in Apps Script end with an _.

We start by defining the Hosted Dolt connection information.

var server   = 'timsehn-google-sheets.dbs.hosted.doltdb.com';
var dbName   = 'google_sheets';
var username = 'kicu458f9bdnaq56';  
var password = 'FN4pY87wFwoDHroNUePBsceMS3pB7AWX';
var port     = 3306; 

Then, the readHostedDolt() function is a simple select * from test followed by a call to the displayResultsHostedDolt() function. Notice how in the connection string you can specify a branch to connect to by appending /<branch>. This is one of the ways to use branches in Dolt.

function readHostedDolt() {
  var branch = 'main'   
  var url    = 'jdbc:mysql://' + server + ':' + port + '/' + dbName + '/' + branch;
  conn       = Jdbc.getConnection(url, username, password);

  var query    = 'SELECT * FROM test';
  var results  = conn.createStatement().executeQuery(query);
  
  var sheetName = 'test-read';
  displayResultsHostedDolt_(sheetName, results)

  conn.close();
}

For the writeHostedDolt() function, we iterate through the data on the sheet and batch up a prepared insert statement. We do this while connected to a branch named branch. At the end of our writes we make sure to call dolt_commit(). The --skip-empty flag allows the commit to be skipped if there is nothing to commit. Without skip-empty, call dolt_commit() will fail if the user tries to write the same data that is already there.

This function could be more robust to some errors like extra columns or a missing header row. I wanted to keep the example as simple as possible.

function writeHostedDolt() {
  var sheetName = 'test-write'
  var spreadsheet = SpreadsheetApp.getActive();
  var sheet = spreadsheet.getSheetByName(sheetName);

  var data = sheet.getDataRange().getValues();

  var branch = 'branch' 
  var url    = "jdbc:mysql://" + server + ":" + port + "/" + dbName + '/' + branch;
  var conn   = Jdbc.getConnection(url, username, password);
  
  // Clear the table so we mirror the sheet
  var query = "delete from test";
  conn.createStatement().execute(query);

  // Prepare a batch query
  query = conn.prepareStatement("insert into test values(?, ?)");

  // ignore header row by starting index at 1
  for (var i = 1; i < data.length; i++) {
    for (var j = 0; j < data[i].length; j++) {
      query.setObject(j+1, data[i][j]);
    }
    query.addBatch();
  }
  
  query.executeBatch();
  query.close();

  var query = "call dolt_commit('--skip-empty', '-am', 'New commit from Google Sheets edits')"
  conn.createStatement().executeQuery(query);

  conn.close();
}

For the showDiffHostedDolt(), we call the dolt_diff() table function and display the results.

function showDiffHostedDolt() {
  var url    = "jdbc:mysql://" + server + ":" + port + "/" + dbName;
  var conn   = Jdbc.getConnection(url, username, password);

  var query    = "SELECT * FROM dolt_diff('main', 'branch', 'test')";
  var results  = conn.createStatement().executeQuery(query);

  var sheetName = 'test-diff';
  displayResultsHostedDolt_(sheetName, results);
  
  conn.close();
}

This helper function takes the results of a query and prints them in a sheet.

function displayResultsHostedDolt_ (sheetName, results) {
  var spreadsheet = SpreadsheetApp.getActive();
  var sheet       = spreadsheet.getSheetByName(sheetName);

  // Start the sheet fresh
  sheet.clearContents();

  var metaData = results.getMetaData();
  var numCols  = metaData.getColumnCount();

  var insert = [];

  // Insert the column names as a header row
  for (var col = 0; col < numCols; col++) {
    insert.push(metaData.getColumnName(col + 1));
  }

  sheet.appendRow(insert);

  while (results.next()) {
    insert = [];
    for (var col = 0; col < numCols; col++) {
      insert.push(results.getString(col + 1));
    }

    sheet.appendRow(insert);
  }
}

The DoltHub connection code is in DoltHub.gs.

The readHostedDolt() function is a call to the DoltHub SQL API with a simple select * from test query. We then call the display results helper function. Because DoltHub API results come back unordered, we then sort by the ID column using the spreadsheet sort function.

function readDoltHub() {
  // This will only scale to 200 rows
  var query = 'select * from test';
  var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/main?q=' + encodeURIComponent(query);

  var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true});
  data = JSON.parse(response);

  var sheetName = 'test-read';
  displayResultsDoltHub_(sheetName, data);

  // Workaround for https://github.com/dolthub/dolthub-issues/issues/497
  var spreadsheet = SpreadsheetApp.getActive();
  var sheet       = spreadsheet.getSheetByName(sheetName);
  sheet.getRange(2, 1, sheet.getLastRow() - 1, sheet.getLastColumn()).sort(1);
}

For the writeDoltHub() function, we iterate through the data on the sheet and make an individual insert query for each row. Unlike Hosted Dolt, this requires we know whether we are inserting a string or a number so we can quote correctly. The we use the callWriteAPI_() helper function to run each query on DoltHub.

function writeDoltHub() {
  var sheetName = 'test-write';
  var spreadsheet = SpreadsheetApp.getActive();
  var sheet = spreadsheet.getSheetByName(sheetName);

  var data = sheet.getDataRange().getValues();

  var branch = 'branch' ;
  
  // Clear the table so we mirror the sheet
  var query = "delete from test";
  callWriteAPI_(query)

  var data = sheet.getDataRange().getValues();
  // ignore header row by starting index at 1
  for (var i = 1; i < data.length; i++) {
    // Need to know the data shape here because we need to know what to quote
    var id = data[i][0];
    var words = data[i][1];
    query = 'insert into test values (' + id + ',"' + words + '")';
    callWriteAPI_(query);
  }
}

For showDiffDoltHub() we also use the DoltHub SQL API ro run the dolt_diff() table function and display the results.

function showDiffDoltHub() {
  var query    = "SELECT * FROM dolt_diff('main', 'branch', 'test')";
  var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/main?q=' + encodeURIComponent(query);

  var response = UrlFetchApp.fetch(url, {'muteHttpExceptions': true});
  data = JSON.parse(response);

  var sheetName = 'test-diff';
  displayResultsDoltHub_(sheetName, data);
}

This helper function takes the results if a query and prints them in a sheet. Notice with DoltHub API versus Hosted Dolt, you have to take a lot more care in getting the correct order.

function displayResultsDoltHub_(sheetName, results) {
  var spreadsheet = SpreadsheetApp.getActive();
  var sheet       = spreadsheet.getSheetByName(sheetName);

  // Start the sheet fresh
  sheet.clearContents();

  var insert = [];

  // Column Names
  var numCols = data.schema.length
  for (var col = 0; col < numCols; col++) {
    insert.push(data.schema[col].columnName);
  }

  sheet.appendRow(insert);

  // Data
  var numRows = data.rows.length;
  for (var row = 0; row < numRows; row++) {
    insert = []
    // Row data comes back unordered so use the column names which are
    // ordered to put the row data in the right order
    for (var col = 0; col < numCols; col++) {
      var colName  = data.schema[col].columnName;
      var rowValue = data.rows[row];
      var colValue = rowValue[colName];

      insert.push(colValue);
    }

    sheet.appendRow(insert);
  }
}

This helper function abstracts the heavy lifting of writing to the DoltHub API away from the main write function. The DoltHub write API is asynchronous so you have to make a request and then poll an endpoint and wait until the job is marked complete. Only one write can be processed at a time.

function callWriteAPI_(query) {
  var url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/write/main/branch?q=' 
    + encodeURIComponent(query);
  var token = 'dhat.v1.a0d6k79cnr88q7pvc6lmjpva7olqff3ioufjb6qnem59kkksdc4g';

  var headers = {
      "authorization": token
  };

  var options = {
    "method": "post",
    "headers": headers
  };

  var response = UrlFetchApp.fetch(url, options);
  var responseJSON = JSON.parse(response);

  var operationName = responseJSON.operation_name;

  options = {
    "headers": headers
  };

  url = 'https://www.dolthub.com/api/v1alpha1/timsehn/google_sheets/write?operationName=' + 
    encodeURIComponent(operationName);

  var isDone = false;

  var maxAttempts = 10;
  var attempts = 1;
  while ( !isDone ) {
    if (attempts > maxAttempts) {
      throw new Error('Could not write to DoltHub');
    }
    response = UrlFetchApp.fetch(url, options);
    responseJSON = JSON.parse(response)
    isDone = responseJSON.done;
    Utilities.sleep(100);
    attempts++;
  }
}

Buttons and Formulas

Once you have Apps Script functions accessing them via buttons of formulas is pretty straight forward. For buttons, you make an image and assign the image an Apps Script function. For formulas, every Apps Script function defined is available via standard formula syntax. In the demo, we could replace the "Show Current Diff" button with =showDiffDoltHub() in cell A1.

The Demo

Time for a demo. We're going to show off the DoltHub version.

Load data from Hosted Dolt

Go to the Google Sheet, navigate to the "test-read" sheet, and click the "Load from DoltHub" button. This button runs the readDoltHub() function.

When I ran it, the database on DoltHub looked like this.

DoltHub Start

After I clicked the button, the Google Sheet matches.

Google Sheet Start

So far so good.

Copy data to test-write sheet and make some changes

Now I copy the data I loaded to the "test-write" sheet and make some changes. This database is really down on Tim's abilities. I'm going to cheer that happy fellow on a bit more.

Google Sheet Edited

After I'm done, I click the "Write to DoltHub" button which calls the writeDoltHub() function.

View the diff

The data will now be on the branch branch on the DoltHub database. I select branch from the branch selector to view the data on DoltHub.

DoltHub Branch Edited

Back on the Google Sheet, I can see the diff by loading it into the "test-diff" sheet using the "Show Current Diff" Button.

Google Sheet Diff

Much more positive. You can see all but the row with ID 3 changed.

Make a Pull Request

Now I go back to the DoltHub database to make a pull request to send to my teammate for review.

I click the "Pull Requests" tab and the "Create Pull Request" button. I select the "base branch" main and the "from branch" branch.

DoltHub Create PR

I click the "Create Pull Request Button" and voila, I have an open Pull Request to send to my teammate for review.

DoltHub PR

My teammate can now review the Pull Request. The most useful tool for Pull Request review is the diff page.

DoltHub PR Diff

This all looks exactly like I intended. I leave a nice PR comment to indicate I reviewed.

DoltHub PR Review

Merge

Now, I want to merge my changes to the main branch so everyone can work off the new copy. I can click the "Merge" button above to have my changes reflected in the main branch. This can take a while for large databases so it kicks off an asynchronous job. But, for this database, it should be very quick.

DoltHub Merge

Once it's done. I can see my new changes in main.

DoltHub End

Reload in Google sheet

Finally, I reload the new data in the Google Sheet.

Google Sheet End

Finally, I confirm main is what I edited and loaded on DoltHub by making sure there is an empty diff.

Google Empty Diff

Conclusion

As you can see Google Sheets and Dolt make a natural pairing for your complicated spreadsheets. Dolt allows Google sheets to be asynchronously edited and human reviewed, much like code in Git and GitHub. Interested? Come ask me for a live demo on our Discord or just play for yourself!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.