- 11 min read
So you want an AI Database?
Here at DoltHub, we built the world's first version-controlled SQL database: Dolt. What do version control and databases have to do with Artificial Intelligence (AI)? It turns out, a lot. At first, we were skeptical about the AI revolution, but then...
Read More
- 3 min read
Adopting go-mysql-server
go-mysql-server is the SQL query execution engine that powers Dolt and DoltHub. Today we are excited to announce that we are adopting the project after its founding company ceased operations. Our fork of the project has over 400 additional commits th...
Read More - 5 min read
April Dataset Spotlight
This blog entry is the first in a new series. Every month we will highlight some interesting datasets on DoltHub. The focus will be on new or updated datasets but sometimes we'll shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt...
Read More - 10 min read
Dolt and DoltHub: Publish Using CSVs
Dolt is a SQL database with Git-style versioning. DoltHub is a place to share Dolt repositories. Dolt is Git for data. DoltHub is GitHub for Dolt. We want to host your public data on DoltHub. We think Dolt and DoltHub provide the best sharing model a...
Read More - 4 min read
Introducing Dolt to SQL sync
Background While building Dolt and DoltHub, we have had many conversations with our users. They all share an interest in finding better ways to manage data. They recognize that writing code to massage CSV, JSON, and other less well known formats, in...
Read More - 5 min read
Using Dolt to Find Test Regressions
Dolt is Git for data. It's a database that lets you clone, fork, branch, merge and diff. This is a really cool technology that has a lot of uses, but today we're going to focus on just one: using Dolt SQL to find regressions in test results. Dolt SQ...
Read More - 5 min read
Common Vulnerabilities and Exposures in Dolt
TLDR: The NVD is a lot more useful when you can simply clone it and query it. The National Vulnerability Database (NVD) is the authoritative source for the publication of Common Vulnerabilities and Exposures (CVE). The vulnerabilities cataloged in t...
Read More - 13 min read
28 grams of Cannabis Data Sets
Happy 4/20! Today is April 20th, the unofficial holiday of marijuana afficionados the world over. Happy 4/20! Or, as we in the data business like to say, Happy 20%! 4/20 is 1/5 is 20% Recreational marijuana has been legalized in a dozen US states,...
Read More - 4 min read
F*#%! you (in 4 languages)
Dolt is to DoltHub as Git is to GitHub - except with Dolt, the unit of versioning is SQL tables. Dolt also has Git-like semantics such as pull, branch and merge. By running dolt pull in a Dolt repository, you know you are getting the most up-to-date ...
Read More - 10 min read
How Dolt Types Work
UPDATED FEBRUARY 10, 2021: Updated the final table with the types that have been added to Dolt since the article was first written. When we started on Dolt, our goal was to apply Git's idea of versioning to data. Whereas Git versions files, Dolt ver...
Read More - 11 min read
Coronavirus State Actions Dataset: A Use Case for Pull Requests
As COVID-19 continues to affect the lives of millions of people around the world, having the most recent and accurate information is an increasingly important tool to help combat the disease. We've been tracking COVID-19 cases for a few months in ou...
Read More - 8 min read
Dolt and DoltHub: Become a Publisher
Dolt is a SQL database with Git-style versioning. In Git the unit of versioning is files. In Dolt, the unit of versioning is SQL tables. Dolt will eventually support 100% of the Git command line and 100% of MySQL SQL. Moreover, anything you can do on...
Read More - 8 min read
Data CI with DoltHub Webhooks
Dolt and DoltHub are Git and GitHub for data. The same way that GitHub enables collaboration on source code repositories in Git format, DoltHub enables collaboration on data repositories in Dolt format. A very common workflow on GitHub involves usin...
Read More - 6 min read
Tracking SQL Correctness and Performance Regressions in Dolt
Tracking Dolt's SQL regressions As part of our journey to make Dolt a great SQL database, we set out to track the correctness of Dolt’s SQL engine against a suite of SQL tests called the sqllogictests. These tests are what we use to measure how clos...
Read More - 14 min read
Dolt for Git Noobs
TL;DR Dolt is a SQL database with built-in Git versioning, branching, and distribution semantics that makes collaborating on and distributing data effortless. What Git does for files, Dolt does for data. Where Git versions files, allowing for fine-g...
Read More - 8 min read
How Dolt Stores Table Data
Dolt is Git for data. It's a SQL database that lets you clone, branch, diff, merge, and fork your data just like you can with a filesystem tree in Git. This blog post explores one of the fundamental datastructures that underlies Dolt's implementation...
Read More - 6 min read
Dolt Use Cases
Dolt is Git for data. Instead of versioning files, Dolt versions tables. DoltHub is a place on the internet to share Dolt repositories. As far as we can tell, Dolt is the only database with branches. How would you use such a thing? One of the hard t...
Read More - 15 min read
Who's at Risk of COVID-19 in the US Congress?
Overview In this blog post, we discuss an approach for simulating an outbreak of COVID-19 in the US Congress. This is a long technical article about data sets, epidemiology, and simulation. Feel free to jump straight to the results of the simulatio...
Read More - 9 min read
How We Built DoltHub: Front-End Architecture
In the previous article in this series, we took a deep look at the overall system architecture of DoltHub, the online data community powered by the Dolt version-controlled database. In this article, we'll zoom in on the front end and see how the code...
Read More - 5 min read
Testing Dolt using Bats
We adopted Bash Automated Testing System (Bats) to test the Dolt command-line. As of March 10, 2020 we are up to 473 tests, though 55 are skipped because they currently fail. The tests define desired behavior so we're constantly working to get skippe...
Read More - 6 min read
Querying Historical Data with AS OF Queries
Dolt is Git for data. It's a SQL database that lets you branch, merge, and fork your data just like you would a Git repository. In previous blog posts we announced how you can use special system tables to query the history of your database. Today, we...
Read More