- 11 min read
So you want an AI Database?
Here at DoltHub, we built the world's first version-controlled SQL database: Dolt. What do version control and databases have to do with Artificial Intelligence (AI)? It turns out, a lot. At first, we were skeptical about the AI revolution, but then...
Read More
- 5 min read
Novel Coronavirus Dataset in Dolt: A Case for Branches
Here at DoltHub, we've been working on COVID-19 data since February 5, 2020. First, we started importing John Hopkins data and then we worked on assembling the largest open, regularly-updated set of case details from Singapore, Hong Kong and South Ko...
Read More - 4 min read
Scraping a JavaScript-enabled Website in 2020
As part of our effort to track data related to the Novel Coronavirus (COVID-19), we wanted to scrape a JavaScript-enabled website on Coronavirus from Hong Kong. Moreover, you'll notice that the website from Hong Kong uses lazy loading based on scroll...
Read More - 6 min read
Novel Coronavirus Dataset in Dolt: Case Details
On Saturday, February 29, this transpired in our company chat room: Tim/Brian Google Chat Snippet A project was born. We had time series data for confirmed cases, deaths, and recoveries segmented by location sourced from John Hopkins but we did no...
Read More - 6 min read
How We Built DoltHub: Stack and Architecture
In our introductory article for this series, we took a high-level look at the technology stack and architecture behind DoltHub, the online home for Dolt data repositories. In this article, we'll delve a little deeper and discuss how the pieces of the...
Read More - 6 min read
Optimizing Sorted Map Iteration
In this blog post I want to give an introduction to some core concepts used to implement fast querying of databases. These techniques were implemented in Dolt and produced significant performance improvements. Database internals The B-Tree is a cor...
Read More - 10 min read
So You Want Git for Data?
Dolt is Git for Data. Learn about the options for versioning data catalogs, data pipeline version tools, and version controlled databases. The Dolt database versions data and schema with full audit history, diffs, and rollbacks.
Read More - 7 min read
Visualizing Temperature Changes Over Time
In the first part of this two part blog I covered NOAA's "Global Hourly Surface Data" dataset and how it is modeled in Dolt. Dolt is git for data, and for this dataset we model a day of observations as a single commit in the commit graph. In this se...
Read More - 6 min read
NOAA Global Hourly Surface Data
The National Oceanic and Atmospheric Administration, NOAA, publishes weather measurements taken from stations around the world. It started in 1901 with a handful of stations, and there are more than 35,000 stations today. Most of these stations provi...
Read More - 7 min read
Announcing Saved Queries
Dolt is Git for data. We built Dolt to help teams collaborate on data sets using the forking, branching, and merging workflows that Git popularized. These workflows are what enable software engineers to collaborate on source code, and they're what en...
Read More - 4 min read
Copyrightable Material
In our previous blog post we examined some freely available licensing tools for open data from Creative Commons. To briefly recap a license specifies the terms under which copyrightable material is made available for public access, sharply distinct f...
Read More - 3 min read
Data Licensing
Introduction Dolt is a data format. DoltHub is a collaboration platform for data stored in the Dolt format. When sharing copyrighted content the terms of that sharing are governed by a license. In this post we highlight some common licenses attached...
Read More - 8 min read
Novel Coronavirus Dataset in Dolt
John Hopkins University Center for Systems Science and Engineering began collecting, tabulating, and publishing Novel Coronavirus (COVID-19) data on January 31, 2020. We started importing this dataset into Dolt on February 5, 2020. This blog will exp...
Read More - 5 min read
How We Built DoltHub: Introduction
Towards the end of last month, we launched a totally reworked and redesigned version of DoltHub, our web application for hosting and collaborating on Dolt repositories. Now that we've had a little while to iron the kinks out, it seems like a good tim...
Read More - 3 min read
Dolt and DoltHub Documentation
Background We are excited to announce the launch of our documentation site. The goal of Dolt and DoltHub is to enable developers and the data community with radically better data infrastructure. High quality documentation should empower users by all...
Read More - 10 min read
Implementing indexed joins
Happy Valentines Day from all of us at DoltHub! You are the reason we do what we do! It you. In honor of the holiday, we want to talk about how much we love making queries faster. We're going to examine how our SQL engine makes a query plan and exp...
Read More - 4 min read
LICENSE.md and README.md in Dolt
Dolt and DoltHub strive to be the best data distribution platform on the internet. Having documentation versioned alongside data, and a standard, easy way to read the documentation online are features we admire in Git and GitHub. Following in Git's ...
Read More - 7 min read
Introducing SQL VIEW Support in Dolt
Dolt is a SQL database with Git-style versioning and distribution. The most recent releases of Dolt introduced support for SQL views that are stored as part of, and versioned along with, a Dolt repository. This provides a great way for data sets to d...
Read More - 8 min read
Dolt and DoltHub: Getting Started
Dolt is a SQL database with Git-style versioning. In Git the unit of versioning is files. In Dolt, the unit of versioning is SQL tables. Dolt will eventually support 100% of the Git command line and 100% of MySQL SQL. Moreover, anything you can do on...
Read More - 5 min read
Mapping Income Inequality using IRS SOI Data
In a previous blog I showed how the history of a dataset can be queried using the dolt history tables, and in the first part of this 2 part blog I covered the IRS SOI data. In this second part I use the IRS SOI data along with doltpy to map out incom...
Read More - 6 min read
IRS Sources Of Income Dataset
Every year the IRS publishes a treasure trove of data. It contains over a hundred different metrics which provide insight into the finances of American taxpayers. Even more compelling is they provide this information at ZIP code granularity, which ca...
Read More