So you want Git SQL?

REFERENCE
4 min read

What does a mash up of Git and SQL look like? We built Dolt, a SQL database you can branch and merge like a Git repository. Is there a bit of a Git SQL movement forming? There are other Git + SQL mash ups being built that we find interesting. If this is a movement, we're happy to be part of it.

Git + SQL

Git is an open source distributed version control system that has become the software development standard for source code control. The ability to clone, branch, and merge Git repositories has allowed for truly distributed source code collaboration by millions of software developers worldwide.

SQL or Structured Query Language is the worldwide standard for data description and querying. It was invented in the early 1970s and is used in all manner of databases. SQL is the standard language to interface with databases.

The mash up of these two concepts takes four general forms.

  1. What if we used SQL to explore Git repositories?
  2. What if our SQL database was a file that we could stick in Git?
  3. What if we versioned and shared SQL queries?
  4. What if we changed the SQL database engine to provide Git functionality?

We'll explore each in this article.

Git SQL

What if we used SQL to explore Git repositories?

Gitbase

Tagline
A SQL database interface to Git repositories
Initial Release
November 2016
GitHub
https://github.com/src-d/gitbase

This project was created by src-d, creators of go-mysql-server. We here at DoltHub adopted go-mysql-server and use it as the foundation for Dolt's SQL engine. Unfortunately, src-d ceased operations in 2018.

The purpose of Gitbase was to provide a SQL interface to a Git repository's database. For instance, you could set up Gitbase on top of your Git repository and run this type of query (notice the use of a standard MySQL client):

MySQL [(none)]> SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
+------------------------------------------+---------------------+-----------------------+
| commit_hash                              | commit_author_email | commit_author_name    |
+------------------------------------------+---------------------+-----------------------+
| 003dc36e0067b25333cb5d3a5ccc31fd028a1c83 | user1@test.io       | Santiago M. Mola      |
| 01ace9e4d144aaeb50eb630fed993375609bcf55 | user2@test.io       | Antonio Navarro Perez |
+------------------------------------------+---------------------+-----------------------+
2 rows in set (0.01 sec)

If you're interested in an open source SQL interface written in Golang and are looking for a cool open source project to adopt as a maintainer, check out Gitbase.

MergeStat

Tagline
SQL for the software development lifecycle
Initial Release
July 2020
GitHub
https://github.com/mergestat/mergestat

MergeStat has a similar mission to that of Gitbase, add a SQL interface to a Git repository. Instead of implementing the mission in MySQL SQL dialect, MergeStat uses SQLite SQL dialect. MergeStat leverages the SQLite virtual table mechanism

MergeStat implements a number of tables including commits and blame. The project is cool and active. If you were sad to see Gitbase defunct, definitely check out MergeStat.

What if our SQL database was a file we could stick in Git?

SQLite

Tagline
A small, fast, self-contained, high-reliability, full-featured, SQL database engine
Initial Release
August 2000
GitHub
None...but https://sqlite.org/src/doc/trunk/README.md

SQLite was not explicitly designed to integrate with Git. However, SQLite databases are files that can easily be stored in Git. Putting a SQLite database in Git does not get granular diffs or merges but you do get easy distribution through clone. Thus, including a SQLite database in your Git repository and distributing that database through GitHub is a common database use case.

If you are looking for an easy, lightweight way to store data in a database and share it via Git, SQLite is a good option for you.

gitSQL

Tagline
Source Control for SQL
Initial Release
March 2015
GitHub
https://github.com/gitsql

gitSQL is eponymous with this article so it had to be mentioned! gitSQL is a graphical user interface that connects to your running SQL Server or PostgreSQL database and dumps the content of your database into a file to be stored in Git. The main selling point seems to be cost. It competes with Redgate by being much cheaper and open source.

gitSQL works with schema and data. I'm not certain what level of diff and merge functionality you get from these files. Those features aren't sold very hard so I suspect the support is minimal. But, if you're looking for a way to backup, version, and share your database in Git using a convenient GUI, gitSQL is for you.

What if we versioned Queries?

DBT

Tagline
Transforming Data. Transforming teams.
Initial Release
July 2016
GitHub
https://github.com/dbt-labs/dbt-core

Data Build Tool or DBT takes the idea of versioning SQL queries in Git a few steps further. DBT is an extremely popular tool for writing and versioning data transforms as SQL. DBT's focus is on analytics (Online Analytics Processing - OLAP) workloads. It allows analytics engineers to adopt a reusable, composable format for analytics workflows consisting of projects and models. These primitives surround core SQL so DBT feels familiar.

DBT primitives can be sequenced to build large data pipelines. Moreover, the individual formatted queries can be versioned, say in Git, and reused in a bunch of other analytics jobs. For instance someone writes the Daily Active User query as a DBT model and that model is used in a bunch of pipelines in the organization.

DBT has become extremely popular and is having a bit of a moment. If you're looking to adopt a Git + SQL style approach to your analytics pipeline, DBT is the tool for you.

What if we changed the SQL database engine to provide Git functionality?

Dolt

Tagline
It's Git for Data
Initial Release
August 2019
GitHub
https://github.com/dolthub/dolt

Dolt takes the concept of Git + SQL rather literally. What if you had Git functionality on top of tables instead of files? That's Dolt.

Dolt can operate as Git for Data as a command line tool or as Database Version Control when run as a MySQL compatible server.

Dolt mimics the Git command line exactly. Diffs and merges are performed on schema and data. Conflicts are detected cell-wise. In SQL, git read operations are exposed as system tables and git write operations are exposed as SQL functions. Moreover, Dolt allows you to store and version queries, much like VersionSQL.

If you're looking to apply Git concepts to your whole SQL database, Dolt is for you. In a lot of ways, Dolt is the logical extreme of the Git SQL movement.

Curious about Dolt or other Git + SQL tools? Come hang out with us on our Discord. We love chatting about this stuff.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.