- 11 min read
So you want an AI Database?
Here at DoltHub, we built the world's first version-controlled SQL database: Dolt. What do version control and databases have to do with Artificial Intelligence (AI)? It turns out, a lot. At first, we were skeptical about the AI revolution, but then...
Read More
- 10 min read
A Guide to Unit Testing React Apollo Components
DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. It's a Next.js application written in Typescript, backed by a GraphQL server that calls gRPC services written in Golang. We use Apollo's built-in integration wi...
Read More - 6 min read
Dolt vs MySQL: How it Started, How it's Going
How it Started For those following along, we've been working on improving Dolt's performance with the goal of making Dolt no more than 2-4 times slower than MySQL. When we set out to measure Dolt's performance we chose Sysbench, a widely used open-s...
Read More - 2 min read
January Dataset Spotlight
It's that time. Our January dataset spotlight here at DoltHub. For the new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties ...
Read More - 3 min read
Announcing DoltHub Issues
DoltHub is a place on the internet to share and collaborate on Dolt databases. We built DoltHub because we thought it would be useful to interact with versioned SQL databases in familiar ways. For example, query public data on the web, or clone it d...
Read More - 3 min read
More Hiring
In October, we set out to hire more engineers to work on Dolt and DoltHub. Dolt is a SQL database with Git-like versioning and DoltHub is a place to share Dolt repositories. Since then, we added three engineers: Vinai, Remy, and Max. Welcome to all t...
Read More - 6 min read
Release notes generation for GitHub repos
Introduction Today we're excited to announce the open sourcing of a tool to automatically generate markdown formatted release notes for GitHub repositories. Dolt is using this tool to generate our release notes going forward, and we've also used it ...
Read More - 5 min read
Dolt and Data Science - A Simple Example
Dolt is Git for data, a SQL database with version control. We've been working hard recently on making Dolt a useful tool for Data Science (DS) practitioners and we're hoping to launch some slick integrations soon. But first, we wanted to start off t...
Read More - 6 min read
Managing DoltHub Dependencies
Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. DoltHub consists of three separate React applications: our main Next.js app, as well as two Gatsby apps for our blog and documentation. Our dependency problem We...
Read More - 4 min read
Performance Benchmarks on Pull Request
Overview Not long ago we wrote about measuring Dolt's performance against MySQL with the goal of improving Dolt to be no more than 2-4 times slower than MySQL. To work toward this goal, we created a containerized tool that benchmarks supplied versi...
Read More - 4 min read
Hospital Price Transparency $10,000 Database Bounty
On January 1, 2021, a US law was passed requiring hospitals to publish their prices in human and machine readable format. We would like to assemble the best open dataset of hospital prices in the US to aid researchers. To this end, we’re launching ou...
Read More - 7 min read
Supporting Larger File Imports on DoltHub
Introduction Back in November, we announced support for uploading CSV files on DoltHub directly to Dolt repository commits. Since then, we've been quickly iterating on features for upload on the web. We recently released changes to our implementatio...
Read More - 8 min read
Optimizing varint Decoding
Introduction Dolt stores data in a content addressable prolly tree in order to get efficient merges and diffs. In designing the table data format one of our goals was to make table column additions and deletions fast operations. They should not requ...
Read More - 19 min read
Pennsylvania ballot data revisited
Introduction In November, shortly after the election, we published an analysis of Pennsylvania ballot data provided by the Pennsylvania Department of State. The purpose of the analysis was to determine if there was any truth to claims of irregularit...
Read More - 3 min read
December Dataset Spotlight
We have been running the DoltHub dataset spotlight since May 2020. This is our eighth issue. The intent was to add additional exposure to Dolt datasets published on DoltHub. Publishing this blog monthly has presented some challenges content-wise. In...
Read More - 15 min read
Planning joins to make use of indexes
Introduction Dolt is Git for Data. It's a SQL database that you can clone, fork, branch, and merge. Dolt's SQL engine is go-mysql-server, and today we're going to discuss how it implements join planning to make a query plan involving multiple tables...
Read More - 4 min read
US Presidential Election $25,000 Database Bounty Update
Last Monday, we released our first data bounty to earn a share of $25,000 by wrangling US Presidential Precinct-level data. This blog will update you on the progress and encourage you to participate. Finally, we'll get a little meta and let you know ...
Read More - 5 min read
Keyless Tables in Dolt
Dolt is a tool built for collaboration and data distribution, it's Git for Data. Git versions files, Dolt versions tables. Today, we're announcing support for keyless tables in Dolt. Strongly typed schemas are the best and worst parts of relational ...
Read More - 5 min read
Bounty Attribution
On Monday we launched Bounties, a product that pays users to gather and clean data. In less than a week, our first data bounty has already shown the power of Dolt as a collaborative data platform. In that time our bounty has received 22 Pull requests...
Read More - 5 min read
Introducing Data Bounties
In 2018, we started the company that is now DoltHub to "create a place on the internet to get access to interesting, maintained data". The data ecosystem of today reminds us a lot of the open source ecosystem of the late 1990s early 2000s. It's ther...
Read More - 7 min read
Earn your share of $25,000 building US Presidential Election Database
Today, we're launching a way to make money building Dolt databases called Bounties. We'll have a follow on blog post Wednesday explaining the motivations for the Bounties feature. But today, we're going to jump right to the chase and explain how you ...
Read More