1. REFERENCE
    10 min read

    So You Want Git for Data?

    Tim SehnTim Sehn |

    People have been asking for a Git and GitHub for data for a while. That thread on Stack Exchange is almost seven years old and is the number three Google search result for "git for data" (for me). What is “Git for data” in practice? Many products…

    Read More
  1. 2 min read

    US Schools Bounty Retrospective

    Our $10,000 US Schools bounty just completed. We assembled a cleaned, free database of US schools. Please use it and let us know what you think. How did we do? This bounty was a little strange because it turned out multiple relatively large public…

    Read More
  2. 8 min read

    So you want Database Version Control?

    Database Version Control is a poorly ranked Google search. For me, it starts with a horizontally scroll-able section on code version control: irrelevant. Then, next up is ads for Redgate, Liquibase, and verta.ai. The rest of the non-sponsored…

    Read More
  3. 14 min read

    Managing State with React and Apollo Client

    DoltHub is a Next.js application written in Typescript and backed by a GraphQL server. We use Apollo's built-in integration with React to manage our GraphQL data within our React components. If you want to know more about DoltHub's architecture…

    Read More
  4. 7 min read

    Announcing Janky Hosted DoltHub Databases

    We all know that self-hosting a database is a challenging and daunting task. The hosting environment needs to have all the database's dependencies, must have the proper user/group permissions, and requires a slew of other nuanced, properly configured…

    Read More
  5. 13 min read

    2020 Census

    Having accurate information on where people live within a country is of critical importance to the function of that country's government. The United States constitution mandates that congress hold a census every 10 years. It is a labor-intensive and…

    Read More
  6. 16 min read

    How to Fix a Bug in Dolt's SQL Engine

    Dolt is an open-source SQL database that has Git-like functionality, including branch, merge, clone, push and pull. As we attract more and more users with various use cases and ways of integrating Dolt into their existing workflows and systems, it's…

    Read More
  7. 3 min read

    Announcing the $10,000 US Businesses Bounty

    This guest blog post is by Spacelove, a top performer of several of our former bounties and winner of many thousands of dollars in prize money. He agreed to be the coordinator and judge for this new data bounty. Announcing the $10,000 US Businesses…

    Read More
  8. 7 min read

    A nasty bit of undefined timezone behavior in Golang

    Go is a great language. Really, it is! We complain about the rough edges, but on the whole it's been a great choice for us, and we're not sorry we picked it. But just for fun, let's talk about some of those rough edges at length. That was a pot…

    Read More
  9. 5 min read

    Faster than MySQL? An investigation

    Introduction At Dolthub, we are obsessed with Dolt's performance. In the past we've written about Sysbench, our primary benchmarking mechanism for comparing our performance to MySQL's. Every time we run a release of Dolt we run a CI-job that executes…

    Read More
  10. 7 min read

    Unleash the Fuzz

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features. We're moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning…

    Read More
  11. 6 min read

    Django and Dolt part II

    Back in June, we wrote about running Django on Dolt. We described our journey from Dolt as "Git for Data" to what we are today: a MySQL compatible relational database that is 99% SQL compliant. To fulfill our vision as a drop-in replacement for MySQL…

    Read More
  12. 8 min read

    A Poorman's Pachyderm: Do we need enterprise ML platforms?

    Developing machine learning infrastructure is a competitive new space. Machine learning operations (MLOps) promises to replace stodgy, manual scripts with automated and reproducible pipelines, unlocking priceless volumes of business value in the…

    Read More
  13. 5 min read

    Dolt and Images

    Dolt is a MySQL-compatible database that uniquely supports version control operations such as branch, merge, and diff. Over the past couple of months, our customers have been asking us to support the loading and versioning of files, particularly…

    Read More
  14. 4 min read

    Menus bounty retrospective

    Dolt is a MySQL-compatible database with branch, merge and diff. DoltHub is a place on the internet to host, share and query Dolt databases. This blog is a retrospective on the bounty that launched on July 14th and wrapped on August 4th. Part of the…

    Read More
  15. 2 min read

    Announcing $10,000 US Schools Bounty

    It's time for another data bounty. Today, we're launching a $10,000 bounty to collect basic identifying data about schools in the United States. We seeded the database with California K through 12 Schools. Only 49 states (and the District of Columbia…

    Read More
  16. 7 min read

    Open Source Patterns in Python

    About Dolt Web services make heavy use of versioning and open source sharing as a daily part of development. Production deployment is increasingly automated by AWS and other cloud services. Sandboxing, testing, and offline debuggability increase the…

    Read More
  17. 4 min read

    Generational Garbage Collection

    Dolt is a MySQL-compatible database that uniquely supports version control operations such as branch, merge, and diff. Aaron Son has blogged about how Dolt stores data in a prolly tree in order to support these operations efficiently. However…

    Read More
  18. 4 min read

    Learn SQL Using Practice Dolt Databases

    Dolt is a MySQL compatible database with Git-like versioning semantics. DoltHub is a place on the internet to share Dolt databases. Databases are notoriously unforgiving. We all know a person who accidentally ran a bad query in production and broke a…

    Read More
  19. 8 min read

    Indexing Keyless Tables

    Introduction Dolt is a new relational database with Git style versioning and offline debuggability. To be used as an application database, Dolt must be a drop in replacement for other SQL databases. We started with MySQL, the most frequently used SQL…

    Read More
  20. 13 min read

    The Long, Dark Rewrite of the Soul

    Great art is never truly finished, only abandoned. And great, living software is the same way. It's never done, but always in a state of constant flux between additions and rewrites. Software engineers are a curmudgeonly sort, and sometimes we like…

    Read More
  21. 5 min read

    Edit on the Web Redux

    DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. About a year ago we set off on a journey to make adding and editing data on DoltHub possible. Our goal was to ease collaboration on data using Dolt and Dolthub…

    Read More
  22. 7 min read

    Is Dolt a Blockchain?

    From Dolt's inception, we shied away from blockchain hype. Dolt looks more like a traditional SQL database with Git-style features than something that could power a digital coin. Dolt is not targeting traditional blockchain use cases. But over the…

    Read More
  23. 13 min read

    Benchmarking SQL Reads on EBS and S3

    Recently, we conducted some quick and dirty read latency smell tests comparing reads against Amazon's EBS volumes to reads against Amazon's S3. We ran these to test our hypothesis that a proposed infrastructure project would be a worthwhile…

    Read More
  24. 5 min read

    How the menus bounty broke DoltHub

    Dolt is a MySQL-compatible database with branch, merge and diff. DoltHub is a place on the internet to host, share and query Dolt databases. If you’re just hearing about the menus bounty, here is menus launch blog. If you're new to data bounties, I…

    Read More
  25. 5 min read

    Dolt as an Immutable Database

    Dolt is Git for data. DoltHub is a place on the internet to share Dolt databases. In a recent discussion with a potential customer, the customer thought we did not make a big enough deal of Dolt being immutable at a commit. If you provide the commit…

    Read More
  26. 5 min read

    An Overview of DoltHub Infrastructure

    DoltHub is our web application for working with, sharing and collaborating on Dolt databases. We host dolt remotes and run bounties on DoltHub, among other things. Sometimes when talking with customers or candidates, we get questions about how…

    Read More
  27. 15 min read

    Merging Branches with Foreign Keys

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features. We're moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning…

    Read More
  28. 6 min read

    Dolt without DoltHub: Other Dolt Remotes

    Dolt is a SQL database with Git-style versioning, and like git, when you want to make your data available to others you need a place to push it to so others can clone / pull it. In both Git and Dolt this is called a remote. A remote is simply a…

    Read More
  29. 8 min read

    Dolt and Ecto/Elixir

    Dolt is a SQL database with Git-style versioning. A couple of months ago, our team was introduced to an engineering team that wanted to use Elixir with Dolt. Elixir is a "dynamic, functional language", based off the Erlang VM. We thought it was…

    Read More
  30. 10 min read

    Improving Diffs on DoltHub

    DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. Diffing between different versions of data is a big part of what makes Dolt unique as a database, and we revamped our diff page on DoltHub to better show off…

    Read More
  31. 4 min read

    Wrapping up collaborative PDAP data bounty

    Dolt is a MySQL database that can branch, diff and merge. Every 6 weeks we launch a data bounty and pay contributors to build unique, public databases. Payment is based on the percent of cell edits a participant makes. This Wednesday, July 7, we…

    Read More
  32. 4 min read

    Announcing the $10,000 Menus Data Bounty

    If you aren’t yet familiar with Dolt and DoltHub, Dolt is a MySQL database that can branch, diff and merge. DoltHub hosts those databases on the web. We are currently running data bounties, a string of collaborative database projects in which…

    Read More
  33. 3 min read

    Manage by Blog Post

    Any frequent visitor to DoltHub knows we publish a lot of blog posts. How many? Three per week: Monday, Wednesday, and Friday. Everyone at the company writes them. Why do we do that? This blog post will answer why. A blog post about writing blog…

    Read More
  34. 4 min read

    Stripping Features to Improve Write Performance

    When we began working on Dolt in 2018 we leveraged Noms, an open source project that gave us efficient diff and merge capabilities. The company that had built Noms, Attic Labs, pitched it to developers as a decentralized application database. Noms is…

    Read More
  35. 2 min read

    Hitting 99% Correctness for our SQL Database

    Dolt is a SQL database with Git-style versioning. One of our biggest priorities is ensuring that Dolt is a drop in replacement for any MySQL database. That means any query that can be run on a MySQL database must run correctly on a Dolt as well. To…

    Read More
  36. 6 min read

    The Search for Dolt Adopters and the Ideal Customer Profile

    Hi, I'm New Here So for those of you that are new here too and I hope there are many of you, Dolt is a database with Git-like features. In Marvel terms, we took your standard base and gave it superpowers. You can branch, merge, and collaborate just…

    Read More
  37. 9 min read

    Versioning Google Sheets With Dolt and GitHub Actions

    Introduction We will learn how to version control Google Sheets data using Dolt and GitHub Actions today. This is intended for teams who might benefit from a Pull-Request process for managing Google Sheets data changes, complete with strong typing…

    Read More
  38. 7 min read

    Copying all of MySQL's dumbest decisions

    Dolt is Git for data, a SQL database that you can fork, clone, branch, and merge. Those are the features you won't find in any other SQL database. But what about the features you can find in any other database? What if you need those to work in Dolt…

    Read More
  39. 9 min read

    Change Data Capture With Kedro and Dolt

    Introduction We are pleased to introduce a Kedro-Dolt plugin, a collaboration between Quantum Black Labs and DoltHub designed to expand the data versioning abilities of data scientists and engineers. You will find this useful if you are interested in…

    Read More
  40. 12 min read

    Better Data with Great Expectations + Dolt

    Background An explosion of data driven products and business processes is creating an urgent need for best practices to ensure data reaching end users is high quality. This data could be in the form of machine learning models, combining upstream data…

    Read More
  41. 7 min read

    Dolt is a Database

    Aaron, Brian and I founded DoltHub as Liquidata in 2018. Our mission was to add liquidity to the data market. How could we get more data shared? Our main hypothesis was adding branch/merge functionality as in source code to data would facilitate…

    Read More
  42. 7 min read

    Django and Dolt

    Dolthub was started in 2018 to create a place on the internet to access interesting, maintained data. That vision drove us to build Dolt, a versioned, syncable data format. It's "Git for Data". This year we launched data bounties as the logical…

    Read More
  43. 6 min read

    DoltHub's Office Return

    DoltHub has an office again. We're here on 2nd street in downtown Santa Monica. Whether or not to return to the office seems to be a topic of interest in the technology community. This blog entry is our take on remote vs in office work developed for…

    Read More
  44. 10 min read

    Upleveling Flyte’s Data Lineage Using Dolt

    Introduction Dolt and Flyte joined forces to build two data integrations. Dolt is a SQL database that supports Git Versioning. Flyte is a workflow orchestrator for creating and evolving machine learning processes and mission-critical data. When…

    Read More
  45. 5 min read

    Hospital Price Transparency V2 Bounty Retrospective

    Last Wednesday, May 27, we completed our fifth data bounty and first V2 bounty. The focus of the bounty was US hospital prices. We had run a bounty for hospital prices that ended March 1. We loved the results and wanted to see what we could do with a…

    Read More
  46. 3 min read

    Data Bounty for Police Data Accessibility Project

    PDAP are early Dolt adopters. Their open data mission aligns with DoltHub's and we're sponsoring a bounty to help get their project off the ground. This post goes through the mission, the bounty and ways you can contribute. The problem PDAP has a…

    Read More
  47. 10 min read

    Improving DoltHub's Web Query Performance

    One of my favorite things about DoltHub is that users can navigate to any public database it hosts and run real time SQL queries against it. Where other open data sites only provide documentation about the data with a link to download ZIP or CSV…

    Read More
  48. 3 min read

    Services and Support For Dolt

    Background Dolt is a SQL database with Git-like version control features. We recently published a handful of case studies illustrating how companies are making use of Dolt in production settings. Of the companies that are using Dolt in production…

    Read More
  49. 8 min read

    Transactions in a Database with Branches

    Dolt is Git for data, a SQL database that you can fork, clone, branch, and merge. Today we're excited to announce the alpha release of SQL transactions! This blog will cover how transactions work in Dolt and how this differs from more traditional…

    Read More
  50. 8 min read

    Dolt Powered Bartender

    I've always enjoyed working on hardware projects. Some past projects include a dual analog controller that worked with my iPhone, a "Make it Rain" machine that threw out a real dollar bill every time you swiped on your phone, and a pixel art painting…

    Read More
  51. 3 min read

    Brian goes CRM shopping

    After spending 12 years at my previous company in multiple sales and sales leadership roles, I decided it was time for a change when Tim approached me about being the first full time sales person at his new database company. Who’s Tim? The CEO…

    Read More
  52. 10 min read

    Dolt and Fuzz Testing

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features…

    Read More
  53. 4 min read

    Dolt Case Studies

    Background Dolt is a SQL database with Git-like version control features for both data and schema. This makes Dolt useful in a wide variety of applications while possessing a novel set of features. This blog post zeros in on some specific examples of…

    Read More
  54. 3 min read

    DoltHub got a makeover

    DoltHub is a place on the internet to host and collaborate on Dolt databases. TLDR: DoltHub provides a SQL first interface for interacting with your Dolt database. DoltHub got a makeover! This post will detail the background, motivation and results…

    Read More
  55. 2 min read

    April Dataset Spotlight

    It's that time for our April dataset spotlight here at DoltHub. For new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties and…

    Read More
  56. 11 min read

    Using Apollo Client to Manage GraphQL Data in our Next.js Application

    DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. We have a series about how we built DoltHub, which goes deeper into our system and front-end architecture. If you're curious about our architecture or the…

    Read More
  57. 4 min read

    Dolt Now Supports Check Constraints

    Dolt is an SQL database with Git versioning. We have come a long way since initially committing to 100% MySQL compatibility, and today we introduce our latest step in that journey: check constraints. What Are Check Constraints? Check constraints are…

    Read More
  58. 3 min read

    Hospital Price Transparency Bounty Revisited

    Today, I’m thrilled to announce a second round of the Hospital Price Transparency Bounty. Our first pass was very elucidating; we learned a lot about the “shape” of the data and we’re excited to let our Discord lovelies take a second crack at it. We…

    Read More
  59. 13 min read

    Data Version Control and Dolt Reproducibility

    Introduction Dolt and DVC are often compared because of because of DVC's name, Data Version Control. Dolt also does "data version control". So what's the difference? Well, DVC is a version controlled machine learning workflow manager and Dolt is a…

    Read More
  60. 6 min read

    JSON Support in Dolt

    Dolt is a tool built for collaboration and data distribution, it's Git for Data. Git versions files, Dolt versions tables. We've built Dolt to support the MySQL query dialect, with the goal of becoming a drop-in replacement for MySQL. Currently, we…

    Read More
  61. 3 min read

    Making Dolt Compatible With SQL Editors

    Dolt is a SQL database with Git-style versioning. We've been working hard to make Dolt fully compatible with MySQL. An important test for compatibility is support for MySQL editors. These editors provide user interfaces for you to inspect data and…

    Read More
  62. 16 min read

    Introducing Dolt + Metaflow

    Note: this blog post appeared on previously on Medium and represents a collaboration with the Metaflow maintainers at Netflix. Background This post details how to use Metaflow with Dolt. Metaflow is a framework for defining data science and data…

    Read More
  63. 3 min read

    Dolt Ditched the Contributor License Agreement

    Dolt is an open source database that supports Git-style versioning. DoltHub is a place on the internet to share Dolt databases. Dolt is collectively our team's first major open source endeavor. We're learning as we go. We recently ditched our…

    Read More
  64. 2 min read

    March Dataset Spotlight

    It's that time. Our March dataset spotlight here at DoltHub. For new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties and…

    Read More
  65. 2 min read

    Private Repositories are Free

    Dolt is a SQL database with Git-style versioning. DoltHub is Dolt's GitHub. A few weeks ago, we made private repositories free up to 1GB and announced it on Twitter. This blog explains why we made private repositories free up to a Gigabyte. Sorry it…

    Read More
  66. 6 min read

    Data Bounties and Open Data

    "Open Data" is quickly attaining the hype and ambiguity of previous tech crazes like "Big Data" and "Block Chain". The motivation behind Open Data is easy to understand: data is one of the most valuable but closely guarded resources in the tech…

    Read More
  67. 8 min read

    Common Table Expressions (WITH)

    Introduction Dolt is Git for data. It's a SQL database that you can clone, fork, branch, merge, push and pull like a Git repository. We're committed to supporting 100% of the functionality offered by Git and 100% of the functionality offered by MySQL…

    Read More
  68. 7 min read

    Dolt with Popular DataFrame Tools

    Dolt is a version-controlled SQL database. For data science (DS) workflows, specifically, Dolt uses data versioning primitives to implement unique flavors of reproducibility. DataFrames are a common interface for exploring CSV, Parquet and other…

    Read More
  69. 4 min read

    Assembling a Grand Catalog—A Data Bounty Retrospective

    Should you use Dolthub Bounties for your data-wrangling needs? Our bounty partners wanted to assemble a “master” catalog of all the college courses taught in the United States. For them, it was an easy riddle. To recap, a partner approached us with a…

    Read More
  70. 7 min read

    Recent Improvements to Join Planning in Dolt

    Dolt is a SQL database that supports Git-like features, including branch, diff, merge, clone, push and pull. Dolt's SQL functionality is built on top of a SQL engine written in Golang. We've previously blogged about our first steps in optimizing…

    Read More
  71. 10 min read

    Merging and Resolving Conflicts Programmatically with SQL

    In the first part of this two part blog we covered concurrent connection handling within . We learned about session state, how to commit changes, and how to persist those changes across sessions. Today we'll talk about the explicit actions a client…

    Read More
  72. 18 min read

    dolt sql-server Concurrency

    Update 2021-08-02: Dolt now supports SQL transactions with , , and the other MySQL transaction primitives, so it's now safe to run the SQL server out of the box with multiple readers and writers. The techniques described in this blog post are still…

    Read More
  73. 2 min read

    Introducing the Logo-2k+Extended $10k Data Bounty

    Today, we’re launching our 4th bounty. Our previous three on the Presidential Election, Hospital Pricing, and College Course Catalogs have been eminent successes, if I do say so myself. Bounties are a really great way to assemble open datasets. Super…

    Read More
  74. 5 min read

    Introducing Stored Procedures

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features…

    Read More
  75. 11 min read

    Dolt Use Cases in the Wild

    About a year ago, we wrote a blog post about how we thought people might use Dolt, the SQL database you can fork, clone, branch, merge, push and pull just like a git repository. At the time we wrote it, we didn't have any paying customers, and we…

    Read More
  76. 4 min read

    Edit Dolt on the Web Using SQL

    DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. Last week we released a new feature on our edit on the web roadmap: edit data using the SQL Console. About a year ago when we released SQL queries on the web we…

    Read More
  77. 3 min read

    Hospital Price Transparency Bounty Review

    We finished our second data bounty Monday, March 1. The target of the bounty was hospital prices. The results surpassed our expectations. We built a database of 1,400 of approximately 6,000 US hospital's chargemasters, representing over 72.7M prices…

    Read More
  78. 2 min read

    February Dataset Spotlight

    It's that time. Our February dataset spotlight here at DoltHub. For new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties and…

    Read More
  79. 7 min read

    Implementing window functions in go-mysql-server

    Dolt is Git for Data, the first SQL database you can clone, fork, branch and merge. Its SQL engine is go-mysql-server. Our goal is to be a 100% compatible, drop-in replacement for MySQL, but we have a ways to go. Today we're excited to announce the…

    Read More
  80. 7 min read

    OpenElections Follow Up

    As some of you are aware, we finished our first data bounty on Feb. 14 to collect US Presidential Precinct results for 2016 and 2020. On Feb. 15, we published a bounty review. The bounty review gained some distribution on HackerNews after one of the…

    Read More
  81. 4 min read

    National Course Catalog $10,000 Database Bounty

    It's time for another data bounty! We completed the US Presidential Election Precinct Results bounty and we have a week or so left in the Hospital Price Transparency bounty. For the next bounty, we want to build a database of US College Course…

    Read More
  82. 3 min read

    Mypy and Doltpy

    Dolt Dolt is an SQL-database with Git-versioning. The goal of Doltpy, in concert with Dolt, is to solve reproducibility and versioning problems for data and machine learning engineers using Python. Mypy Mypy was created by Guido van Rossum, the…

    Read More
  83. 3 min read

    Dolt CLI in SQL - Update

    Dolt is a SQL database with Git-style versioning. In a previous post we discussed the need to introduce Dolt CLI functions in SQL. We believe that version control is something that can be native to your SQL workflow. This allows for possibilities…

    Read More
  84. 4 min read

    US Presidential Election $25,000 Database Bounty Review

    On December 14, we launched our first data bounty to earn a share of $25,000 by wrangling US Presidential Precinct-level data. The bounty ended yesterday. How did it go? This blog entry will answer that question. Dolt is a SQL database with Git-style…

    Read More
  85. 7 min read

    Doltpy 2.0

    Background Earlier in the week we talked about Dolt's "API surface area." To recap, Dolt is a relational database with version control features. Dolt has a SQL query interface implementing the MySQL dialect, as well as a command line interface (CLI…

    Read More
  86. 6 min read

    Introducing Type Changes

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features…

    Read More
  87. 9 min read

    Part I: Dolt API Surface Area

    Background When DoltHub was founded it was called Liquidata. The goal was to bring liquidity to the data market. The founders realized that the pipes were broken: sending around CSV, JSON, and other formats was broken. The requirement to translate to…

    Read More
  88. 13 min read

    A Guide to Unit Testing React Apollo Components

    DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. It's a Next.js application written in Typescript, backed by a GraphQL server that calls gRPC services written in Golang. We use Apollo's built-in integration…

    Read More
  89. 7 min read

    Dolt vs MySQL: How it Started, How it's Going

    How it Started For those following along, we've been working on improving Dolt's performance with the goal of making Dolt no more than 2-4 times slower than MySQL. When we set out to measure Dolt's performance we chose Sysbench, a widely used open…

    Read More
  90. 2 min read

    January Dataset Spotlight

    It's that time. Our January dataset spotlight here at DoltHub. For the new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties…

    Read More
  91. 3 min read

    Announcing DoltHub Issues

    DoltHub is a place on the internet to share and collaborate on Dolt databases. We built DoltHub because we thought it would be useful to interact with versioned SQL databases in familiar ways. For example, query public data on the web, or clone it…

    Read More
  92. 3 min read

    More Hiring

    In October, we set out to hire more engineers to work on Dolt and DoltHub. Dolt is a SQL database with Git-like versioning and DoltHub is a place to share Dolt repositories. Since then, we added three engineers: Vinai, Remy, and Max. Welcome to all…

    Read More
  93. 7 min read

    Release notes generation for GitHub repos

    Introduction Today we're excited to announce the open sourcing of a tool to automatically generate markdown formatted release notes for GitHub repositories. Dolt is using this tool to generate our release notes going forward, and we've also used it…

    Read More
  94. 6 min read

    Dolt and Data Science - A Simple Example

    Dolt is Git for data, a SQL database with version control. We've been working hard recently on making Dolt a useful tool for Data Science (DS) practitioners and we're hoping to launch some slick integrations soon. But first, we wanted to start off…

    Read More
  95. 6 min read

    Managing DoltHub Dependencies

    Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. DoltHub consists of three separate React applications: our main Next.js app, as well as two Gatsby apps for our blog and documentation. Our dependency problem We…

    Read More
  96. 4 min read

    Performance Benchmarks on Pull Request

    Overview Not long ago we wrote about measuring Dolt's performance against MySQL with the goal of improving Dolt to be no more than 2-4 times slower than MySQL. To work toward this goal, we created a containerized tool that benchmarks supplied…

    Read More
  97. 4 min read

    Hospital Price Transparency $10,000 Database Bounty

    On January 1, 2021, a US law was passed requiring hospitals to publish their prices in human and machine readable format. We would like to assemble the best open dataset of hospital prices in the US to aid researchers. To this end, we’re launching…

    Read More
  98. 8 min read

    Supporting Larger File Imports on Dolthub

    Introduction Back in November, we announced support for uploading CSV files on Dolthub directly to Dolt repository commits. Since then, we've been quickly iterating on features for upload on the web. We recently released changes to our implementation…

    Read More
  99. 10 min read

    Optimizing varint Decoding

    Introduction Dolt stores data in a content addressable prolly tree in order to get efficient merges and diffs. In designing the table data format one of our goals was to make table column additions and deletions fast operations. They should not…

    Read More
  100. 23 min read

    Pennsylvania ballot data revisited

    Introduction In November, shortly after the election, we published an analysis of Pennsylvania ballot data provided by the Pennsylvania Department of State. The purpose of the analysis was to determine if there was any truth to claims of…

    Read More
  101. 3 min read

    December Dataset Spotlight

    We have been running the DoltHub dataset spotlight since May 2020. This is our eighth issue. The intent was to add additional exposure to Dolt datasets published on DoltHub. Publishing this blog monthly has presented some challenges content-wise. In…

    Read More
  102. 18 min read

    Planning joins to make use of indexes

    Introduction Dolt is Git for Data. It's a SQL database that you can clone, fork, branch, and merge. Dolt's SQL engine is go-mysql-server, and today we're going to discuss how it implements join planning to make a query plan involving multiple tables…

    Read More
  103. 4 min read

    US Presidential Election $25,000 Database Bounty Update

    Last Monday, we released our first data bounty to earn a share of $25,000 by wrangling US Presidential Precinct-level data. This blog will update you on the progress and encourage you to participate. Finally, we'll get a little meta and let you know…

    Read More
  104. 5 min read

    Keyless Tables in Dolt

    Dolt is a tool built for collaboration and data distribution, it's Git for Data. Git versions files, Dolt versions tables. Today, we're announcing support for keyless tables in Dolt. Strongly typed schemas are the best and worst parts of relational…

    Read More
  105. 5 min read

    Bounty Attribution

    On Monday we launched Bounties, a product that pays users to gather and clean data. In less than a week, our first data bounty has already shown the power of Dolt as a collaborative data platform. In that time our bounty has received 22 Pull requests…

    Read More
  106. 5 min read

    Introducing Data Bounties

    In 2018, we started the company that is now DoltHub to "create a place on the internet to get access to interesting, maintained data". The data ecosystem of today reminds us a lot of the open source ecosystem of the late 1990s early 2000s. It's there…

    Read More
  107. 9 min read

    Earn your share of $25,000 building US Presidential Election Database

    Today, we're launching a way to make money building Dolt databases called Bounties. We'll have a follow on blog post Wednesday explaining the motivations for the Bounties feature. But today, we're going to jump right to the chase and explain how you…

    Read More
  108. 6 min read

    Archiving Presidential Tweets Using Dolt

    Background This is a guest blog post by a member of the DoltHub community, detailing how they went about accumulating presidential tweets in Dolt. We are grateful to our community members for showing us ways of using Dolt we didn’t think of, and also…

    Read More
  109. 6 min read

    Introducing Dolt CLI in SQL

    Dolt is Git for data, a SQL database with version control tooling. While Dolt is nearing full MySQL compatibility, its current command line interface (CLI) functionality hasn't been accessible in SQL. That means that you can't currently run…

    Read More
  110. 6 min read

    Getting a Mascot for Dolt

    Dolt is Git for Data. It's a SQL database that you can branch, merge, clone, fork, push and pull, just like files in Git. Today we're going to be talking about our quest to get Dolt's branding right, and our first attempt to find a mascot that…

    Read More
  111. 7 min read

    Database Performance: Dolt vs MySQL

    Dolt is a version controlled SQL database. Dolt's query interface is SQL, and it has Git-like version control features. Adding version control features to a SQL database has performance trade offs when comparing Dolt with traditional databases like…

    Read More
  112. 4 min read

    November Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  113. 7 min read

    Filter-Branch in Dolt

    Dolt is a tool built for collaboration and data distribution, a SQL database you can branch, merge, diff, clone, fork, push and pull. Today, we're announcing support for filter-branch in Dolt. "Customer focus" is a mantra for our company. In August…

    Read More
  114. 14 min read

    Continuous Deployment with Github Actions: An Example

    Github Actions FTW Not too long ago we endeavored to migrate Dolt's continuous integration pipeline from Jenkins to Github Actions. I wrote a blog about that process and complimented Github Actions on making the migration process intuitive and easy…

    Read More
  115. 4 min read

    Dolt Supports Prepared Statements

    Dolt is a SQL database that supports Git-like functionality, including branch, merge, clone, push and pull. Dolt targets compatibility with MySQL as an existing SQL dialect and wire protocol. We built Dolt on top of an excellent open-source…

    Read More
  116. 7 min read

    Version Controlled Databases: Defining a Category

    "Database version control" and "version controlled database" are not the same thing. Version controlling your database refers to the practice of storing schema and schema modifications in a traditional source control system like Git. "Version…

    Read More
  117. 6 min read

    Don't Panic

    When Tim, Aaron, and I started working on this problem in August 2018 we immediately began playing with Noms. It was an open source project that gave us a lot of the things Aaron and I had been talking about in order to deliver the features we felt…

    Read More
  118. 6 min read

    Uploading Files to DoltHub

    Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. A few weeks ago I wrote about merging pull requests on DoltHub and our roadmap for "edit on the web". We're working on reducing friction for collaborating on data…

    Read More
  119. 7 min read

    A REST Service for Versioning DataFrames

    We originally built Dolt because we thought that existing data distribution formats were broken. In particular, we believed that consumers of data should not have to parse various formats (CSV, JSON, etc.), write ingestion logic and decide on update…

    Read More
  120. 13 min read

    Debunking an election fraud claim using open data and Dolt

    After four years of incredibly rancorous discourse about whether the US President was illegitimately elected with the help of foreign interference, it should surprise no one that the 2020 presidential election is mired in similar claims of…

    Read More
  121. 4 min read

    Supporting AUTO_INCREMENT

    Dolt is a database built for collaboration and data distribution. It's "Git for Data," a SQL database you can branch, merge, diff, clone, fork, push and pull. We intend to become a fully MySQL compatible database. Today, we're announcing support for…

    Read More
  122. 9 min read

    Doltpy: Dolt in Python

    Dolt is a SQL database with Git-like version control features. It presents a familiar SQL interface while exposing Git-like primitives for versioning tables and their data. Doltpy is a Python API for interacting with Dolt in Python. This post details…

    Read More
  123. 3 min read

    October Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  124. 12 min read

    Pushing down filters to make queries faster

    Dolt is Git for Data, a SQL database you can branch, merge, clone, fork, sync, push and pull. Today we're excited to announce the release of a new optimization in the query planner: pushing down filters! What's a pushdown? Pushdown is a query…

    Read More
  125. 9 min read

    Asynchronous Sorting in Go

    When we began working on Dolt we made the decision to build on top of Noms. Noms stores data in a content addressable DAG, and has countless applications. It was a great starting point for us to build Dolt, and it let us hit the ground running…

    Read More
  126. 6 min read

    Testing Login using Cypress

    Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. We use Cypress.io as our end-to-end testing solution for DoltHub. To learn more about our journey with Cypress, check out our previous blogs: Why we chose Cypress…

    Read More
  127. 5 min read

    We are Hiring

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. It takes a strong technical team to build a database from the storage engine up even when you get head start from open source projects like…

    Read More
  128. 4 min read

    Garbage Collection in Dolt

    Dolt is a tool built for collaboration and data distribution. It's "Git for Data," a SQL database you can branch, merge, diff, clone, fork, push and pull. Today, we're announcing support for garbage collection in Dolt. To manage on-disk storage for…

    Read More
  129. 4 min read

    Using Dolt with Deepnote

    Dolt is Git for data, a SQL database with Git-style versioning. DoltHub is a place on the internet to store and share Dolt databases. Python is the language of data science. As such, we created Doltpy, a Python interface to Dolt. We continue to…

    Read More
  130. 7 min read

    Benchmarking Dolt with Sysbench

    At DoltHub we are building Dolt, a relational database with Git-like version control features. Naturally we are interested in measuring the performance profile of Dolt, and we would also like to make it easy for contributors to assess the effect of…

    Read More
  131. 5 min read

    Announcing Merge on DoltHub

    Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. Our goal is to make it easier to collaborate on data, and pull requests, or proposed changes to a repository, are an essential part of this process. Before today…

    Read More
  132. 6 min read

    Data Collaboration on DoltHub

    Dolt is a SQL database with Git-like functionality, including clone, fork, push, pull, branch, and merge. DoltHub is a place on the internet for hosting, publishing, sharing and collaborating on Dolt databases. A few weeks ago, we announced support…

    Read More
  133. 9 min read

    Announcing triggers

    Dolt is Git for Data, a SQL database you can branch, merge, clone, fork, sync, push and pull. Today we're excited to announce a major new SQL feature: support for triggers! What's a trigger? Triggers are basically little SQL code snippets that you…

    Read More
  134. 3 min read

    September Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  135. 5 min read

    Uncovering MySQL's Gotchas

    Dolt is Git for data. Git versions files, Dolt versions tables. Dolt comes with a SQL engine built in, which lets you run SQL queries against any version of the data you've committed. Our goal is to become fully SQL compliant and compatible with…

    Read More
  136. 3 min read

    Liquidata Inc. is now DoltHub Inc.

    Today, we are changing our company name to DoltHub Inc. It's the end of the Liquidata era. We will be changing our emails and all our company branding to reflect the name change. Liquidata Origin Story We started Liquidata a little over two years…

    Read More
  137. 6 min read

    Pruning 90% of Dolt's SQL server code

    Dolt is Git for data. Git versions files, Dolt versions tables. Dolt comes with a SQL engine built in, which lets you run SQL queries against any version of your data you've committed. Dolt's SQL engine is go-mysql-server, which we forked and then…

    Read More
  138. 7 min read

    Oracle Support in SQL Sync

    Dolt is a relational database with Git-like version control features. In particular the underlying data storage format is a commit graph, and each commit represents the complete state (schema and data) of the database at a point in time. Doltpy, our…

    Read More
  139. 6 min read

    Introducing Forks

    Today, DoltHub released forks. It is the same system that Github uses for collaboration on over 100 million repositories contributed to by their 40+ million users. For the first time there is a general platform for data collaboration, and we hope it…

    Read More
  140. 3 min read

    Using DoltHub for Decentralized Database Collaboration

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. We recently adopted Discord as a low friction way to interact with our customers. It's been a really positive experiment. Our community is…

    Read More
  141. 4 min read

    Tags and Data Releases in Dolt

    Dolt is a SQL database with Git-like functionality. It allows you to branch, merge, diff and clone data sets, by combining the data structures and algorithms of a relational database with a distributed version control system. Dolthub is a place on…

    Read More
  142. 8 min read

    Introducing Column Defaults

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features…

    Read More
  143. 8 min read

    Dolt Implementation Notes — Push And Pull On a Merkle DAG

    Dolt is a SQL database with Git-like functionality, including branch, merge and diff and push and pull to remotes. This is a post in a series of posts about the internal workings of some of the core algorithms that underly Dolt's implementation. The…

    Read More
  144. 5 min read

    Dolt SQL Server MySQL Client Support

    Dolt is a SQL database with Git-style versioning. Dolt ships with a MySQL compatible server that you can start on a repository using . Once started, you can then connect to the running server using standard MySQL clients. We now support C, Python…

    Read More
  145. 3 min read

    Dolt as a Data Management Service

    Dolt is a version controlled SQL database. What that looks like in practice is a SQL engine sitting on top of a commit graph like storage format. Dolt SQL is a superset of MySQL that provides access to the database at every point in the commit graph…

    Read More
  146. 4 min read

    August Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  147. 6 min read

    Dr. Discord, or: How we Learned to Stop Worrying and Love Public Chat

    Executive summary We are a small startup team building a new database tool called Dolt, which is Git for Data. This is the story of how we chose to use Discord for our open source project. You can join our server now! The Fermi Paradox of open source…

    Read More
  148. 5 min read

    SQL Sync for Schema with SQL Alchemy

    Dolt is a version controlled SQL database. It behaves like a traditional relational database in that it offers a SQL interface for data and schema management, but the underlying data structure is a commit graph inspired by Git. One natural use-case…

    Read More
  149. 5 min read

    Announcing DoltHub SQL API

    Dolt is Git for data, a relational database built to create, publish and consume datasets. DoltHub hosts a growing collection of public open datasets stored as Dolt databases. Dolthub allows you explore data through its SQL query interface. We're…

    Read More
  150. 13 min read

    FBI Crime Data and the Future of Data Distribution

    Dolt is Git for data and DoltHub hosts a growing collection of public open datasets. Recently, we created dolthub/fbi-nibrs reflecting the FBI's National Incident Based Reporting System (NIBRS) crime data. Law enforcement agencies from around the…

    Read More
  151. 4 min read

    Open Source Cypress Testing Suite

    Dolt is Git for data and DoltHub is our web application that hosts Dolt repositories. At the beginning of the year we redesigned DoltHub and decided to try out Cypress as our end-to-end testing solution (similar to how we use Bats tests for Dolt…

    Read More
  152. 6 min read

    Collaborative GPT-3 Dataset

    Dolt is Git for data. Recently, we've been thinking a lot about what could be Dolt's Linux. A reader of that blog had a suggestion, an open GPT-3 dataset. Dolt really shines as a collaborative database where many users are making distributed edits…

    Read More
  153. 8 min read

    Testing DoltHub Using Cypress

    Dolt is Git for data and DoltHub is our web application that houses Dolt repositories. At the beginning of Dolt, we adopted Bash Automated Testing System (Bats) for end-to-end testing of the Dolt command-line (check out our blog about Bats here…

    Read More
  154. 9 min read

    Data Integrity for Open Data

    Open Data Validation Recently an article made the rounds at our company about "data integrity" checks. The article advocates that in the absence of perfect code that never corrupts data, it's wise to have "data integrity checks" that ensure data…

    Read More
  155. 9 min read

    Implementing subqueries in go-mysql-server

    Dolt is Git for data. Git versions files. Dolt versions SQL tables. Dolt's SQL engine is go-mysql-server, which is an open source project that we adopted a few months ago. Today we're excited to announce better support for subqueries in the engine…

    Read More
  156. 3 min read

    July Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  157. 6 min read

    The Anatomy of Open Data Projects

    A core motivation for building DoltHub was to empower organizations to collaboratively create and maintain high quality data assets that they could collectively depend on. This is very much analogous to GitHub. Analogies are powerful ways to…

    Read More
  158. 5 min read

    Scraping LinkedIn

    On June 13th, 2016 Microsoft acquired LinkedIn for $26.2 billion due to its ability to successfully monetize the resumes of its users. They have proven the value of a resume database and sell premium services that let recruiters search this database…

    Read More
  159. 11 min read

    Data Dependencies Using DoltHub, an Example

    Introduction In the past we have blogged about the IRS Sources of Income (SOI) data that we harvested and published as a Dolt database. We presented a compelling visualization that was relatively straightforward to create using that database. It was…

    Read More
  160. 6 min read

    Being a Startup in COVID-19 Times

    Today, we're taking a break from our regularly scheduled Dolt and DoltHub content to talk about our experience as a ten person startup in Los Angeles over the past few months as we've all dealt with this pandemic. In the beginning... I can't say we…

    Read More
  161. 9 min read

    In Search of Dolt's Linux...

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. In this blog post we discuss our search for Dolt's Linux. Git Git was built to manage the Linux open source project. Lore has it that Linus…

    Read More
  162. 2 min read

    Announcing Username and Password Login

    DoltHub is a web application for hosting and collaborating on Dolt repositories. Until now, DoltHub has only supported creating accounts and signing in with third-party providers - currently Google and GitHub. We're excited to announce that DoltHub…

    Read More
  163. 8 min read

    Cell-level Three-way Merge in Dolt

    Dolt is a SQL database with Git-like functionality. It supports version control primitives including commit, branch, merge, clone, push and pull. This is the fourth post in a series exploring how Dolt stores table data implements these version…

    Read More
  164. 6 min read

    Data Dependencies Using DoltHub

    A core motivation for the DoltHub team is a belief that obtaining and distributing data should be seamless and robust. Correctness and power combined with simplicity make for positive user experiences. We want users to think in terms of queries on…

    Read More
  165. 5 min read

    Introducing Foreign Keys

    Dolt is a SQL database with Git-style versioning. With each new version of Dolt, we increase the number of supported SQL features, moving toward our goal of being a complete drop-in replacement for MySQL, while adding all of the versioning features…

    Read More
  166. 9 min read

    Migrating from Jenkins to Github Actions

    Dolt is a SQL database with Git-style versioning. DoltHub is the place on the internet to share Dolt databases. For both Dolt and DoltHub, we've always used Jenkins for our continuous integration pipeline but have recently migrated our Dolt…

    Read More
  167. 7 min read

    Open Elections data on DoltHub

    DoltHub is a collaboration platform for data stored in Dolt, a relational database and data storage format with Git-like version control features for structured data. The vision of Dolt and DoltHub together is empowering decentralized communities to…

    Read More
  168. 8 min read

    Diffing Queries in Dolt

    Dolt is a SQL database built to wrangle datasets. Its tables are versioned, queryable, and shareable. We've recreated Git's functionality in a relational database so you can collaborate on data in the same ways you collaborate on code. One of Dolt's…

    Read More
  169. 4 min read

    June Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  170. 7 min read

    How DoltHub Integrates Metered Billing with Stripe

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. Dolt is and always will be an open source tool and DoltHub hosts all public repositories for free. Users interested in hosting private…

    Read More
  171. 2 min read

    Announcing GitHub Login

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. As you can tell from our product names and descriptions, we are inspired by Git and GitHub. We want to bring the same open collaboration…

    Read More
  172. 9 min read

    Efficient Diff on Prolly-Trees

    Dolt is a SQL database with Git-like functionality, including branch, merge, diff, clone, push and pull. This is the third post in a series of blog posts that explore the underlying datastructures that are used to table storage and core algorithms in…

    Read More
  173. 11 min read

    Harnessing our SQL engine tests to run on Dolt

    Introduction Dolt is Git for Data, and its built-in SQL engine is an open source project we recently adopted, go-mysql-server. The engine is a general-purpose SQL execution engine that lets integrators read or write to their custom data source with…

    Read More
  174. 12 min read

    Introducing Cell History Inspection on DoltHub

    Dolt and DoltHub are Git and GitHub for data. Having a versioned database makes collaborating on data more fluid and reliable in the same way that Git improves source code collaboration for software engineers. Using both Git and GitHub, engineers are…

    Read More
  175. 6 min read

    Doltpy 1.0.0

    Background Dolt is a SQL database that stores data in a commit graph, and offers a Git-like interface for management. It offers a command-line-interface (CLI) that provides managing database level considerations such as how and where to start a…

    Read More
  176. 5 min read

    Learn SQL with Real Data using Dolt

    Dolt is a SQL database with Git-style versioning. DoltHub is a place on the internet to share Dolt databases. We think these tools can help people learn and perfect their SQL skills like no other database. This blog explains how. Get started quickly…

    Read More
  177. 3 min read

    May Dataset Spotlight

    Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL…

    Read More
  178. 6 min read

    Delivering Declarative Data to DoltHub with GraphQL

    DoltHub is GitHub for data. As you might imagine, the data-fetching needs on the front end of such an application are intense. In the previous article in this series, we saw how working directly with our gRPC API was making our front-end team rather…

    Read More
  179. 8 min read

    Extending SQL Sync to Postgres

    Background Dolt is Git for data. It is a relational database that implements a storage layout similar to a commit graph, allowing users to clone, branch, and merge structured data. We believe the ability to clone and pull a remote dataset, and…

    Read More
  180. 7 min read

    Introducing Secondary Indexes

    Dolt is a SQL database with Git-style versioning. We're constantly adding new and exciting SQL features, and secondary indexes are one of them! This blog goes over what they are, why they're useful, and how they're implemented in Dolt. What are…

    Read More
  181. 7 min read

    Dolt as an Application Server

    A question we have been asked numerous times is, "Can Dolt be used as an application server"? This has driven a lot of conversations internally about the use cases of a versioned database server, and led to some very technical discussions about…

    Read More
  182. 6 min read

    Distribute Data with Dolt, not APIs

    Application Programming Interfaces (APIs) are the dominant mode of distributing data on the internet. Twitter debates in the data science community about Comma Separated Value (CSV) files vs APIs have flared up lately. We think both of these options…

    Read More
  183. 9 min read

    How GraphQL Saved Us from the gRPC Dumpster Fire We Created

    DoltHub is the online data community powered by Dolt, the version-controlled SQL database. In the previous article in this series, we took an overview of DoltHub's front-end architecture. In this article, we'll take a look at the pit of sadness our…

    Read More
  184. 7 min read

    The Dolt Commit Graph and Structural Sharing

    Dolt is a SQL database that provides Git-like functionality, including clone, push, pull, branch, and merge. This post is part of a series exploring how Dolt stores table data. In our previous post, How Dolt Stores Table Data, we explored a unique…

    Read More
  185. 11 min read

    Using Dolt to Manage Train/Test Splits

    Twitter is wonderful sometimes. We don't know Aaron. He finds us on Twitter, asks a great question, makes us think, and prompts a blog post. How can you use Dolt to manage train/test splits for your Machine Learning models? Dolt is a SQL database…

    Read More
  186. 9 min read

    Using Dolt with the JetBrains DataGrip SQL Workbench

    Dolt has been rapidly expanding its capabilities as a SQL server recently. We've done a lot of work to get the command to be a stable peer to the built-in SQL shell, with all the same capabilities. In the last month we've expanded the SQL server…

    Read More
  187. 6 min read

    Joining Multiple Repositories with SQL Queries

    In our blogs we have shown over and over again how easy it is to clone data from DoltHub and immediately start querying it with SQL. We are constantly working on improving our data catalog. As we do, there emerge more occasions where you can derive…

    Read More
  188. 3 min read

    Adopting go-mysql-server

    go-mysql-server is the SQL query execution engine that powers Dolt and DoltHub. Today we are excited to announce that we are adopting the project after its founding company ceased operations. Our fork of the project has over 400 additional…

    Read More
  189. 6 min read

    April Dataset Spotlight

    This blog entry is the first in a new series. Every month we will highlight some interesting datasets on DoltHub. The focus will be on new or updated datasets but sometimes we'll shed fresh light on a classic. For those new to Dolt and DoltHub, Dolt…

    Read More
  190. 12 min read

    Dolt and DoltHub: Publish Using CSVs

    Dolt is a SQL database with Git-style versioning. DoltHub is a place to share Dolt repositories. Dolt is Git for data. DoltHub is GitHub for Dolt. We want to host your public data on DoltHub. We think Dolt and DoltHub provide the best sharing model…

    Read More
  191. 4 min read

    Introducing Dolt to SQL sync

    Background While building Dolt and DoltHub, we have had many conversations with our users. They all share an interest in finding better ways to manage data. They recognize that writing code to massage CSV, JSON, and other less well known formats…

    Read More
  192. 6 min read

    Using Dolt to Find Test Regressions

    Dolt is Git for data. It's a database that lets you clone, fork, branch, merge and diff. This is a really cool technology that has a lot of uses, but today we're going to focus on just one: using Dolt SQL to find regressions in test results…

    Read More
  193. 6 min read

    Common Vulnerabilities and Exposures in Dolt

    TLDR: The NVD is a lot more useful when you can simply clone it and query it. The National Vulnerability Database (NVD) is the authoritative source for the publication of Common Vulnerabilities and Exposures (CVE). The vulnerabilities cataloged in…

    Read More
  194. 17 min read

    28 grams of Cannabis Data Sets

    Happy 4/20! Today is April 20th, the unofficial holiday of marijuana afficionados the world over. Happy 4/20! Or, as we in the data business like to say, Happy 20%! Recreational marijuana has been legalized in a dozen US states, medical use is…

    Read More
  195. 4 min read

    F*#%! you (in 4 languages)

    Dolt is to DoltHub as Git is to GitHub - except with Dolt, the unit of versioning is SQL tables. Dolt also has Git-like semantics such as pull, branch and merge. By running in a Dolt repository, you know you are getting the most up-to-date data. Not…

    Read More
  196. 15 min read

    How Dolt Types Work

    UPDATED FEBRUARY 10, 2021: Updated the final table with the types that have been added to Dolt since the article was first written. When we started on Dolt, our goal was to apply Git's idea of versioning to data. Whereas Git versions files, Dolt…

    Read More
  197. 15 min read

    Coronavirus State Actions Dataset: A Use Case for Pull Requests

    As COVID-19 continues to affect the lives of millions of people around the world, having the most recent and accurate information is an increasingly important tool to help combat the disease. We've been tracking COVID-19 cases for a few months in our…

    Read More
  198. 10 min read

    Dolt and DoltHub: Become a Publisher

    Dolt is a SQL database with Git-style versioning. In Git the unit of versioning is files. In Dolt, the unit of versioning is SQL tables. Dolt will eventually support 100% of the Git command line and 100% of MySQL SQL. Moreover, anything you can do on…

    Read More
  199. 10 min read

    Data CI with DoltHub Webhooks

    Dolt and DoltHub are Git and GitHub for data. The same way that GitHub enables collaboration on source code repositories in Git format, DoltHub enables collaboration on data repositories in Dolt format. A very common workflow on GitHub involves using…

    Read More
  200. 6 min read

    Tracking SQL Correctness and Performance Regressions in Dolt

    Tracking Dolt's SQL regressions As part of our journey to make Dolt a great SQL database, we set out to track the correctness of Dolt’s SQL engine against a suite of SQL tests called the . These tests are what we use to measure how closely Dolt's SQL…

    Read More
  201. 16 min read

    Dolt for Git Noobs

    TL;DR Dolt is a SQL database with built-in Git versioning, branching, and distribution semantics that makes collaborating on and distributing data effortless. What Git does for files, Dolt does for data. Where Git versions files, allowing for fine…

    Read More
  202. 9 min read

    How Dolt Stores Table Data

    Dolt is Git for data. It's a SQL database that lets you clone, branch, diff, merge, and fork your data just like you can with a filesystem tree in Git. This blog post explores one of the fundamental datastructures that underlies Dolt's implementation…

    Read More
  203. 6 min read

    Dolt Use Cases

    Dolt is Git for data. Instead of versioning files, Dolt versions tables. DoltHub is a place on the internet to share Dolt repositories. As far as we can tell, Dolt is the only database with branches. How would you use such a thing? One of the hard…

    Read More
  204. 18 min read

    Who's at Risk of COVID-19 in the US Congress?

    Overview In this blog post, we discuss an approach for simulating an outbreak of COVID-19 in the US Congress. This is a long technical article about data sets, epidemiology, and simulation. Feel free to jump straight to the results of the…

    Read More
  205. 10 min read

    How We Built DoltHub: Front-End Architecture

    In the previous article in this series, we took a deep look at the overall system architecture of DoltHub, the online data community powered by the Dolt version-controlled database. In this article, we'll zoom in on the front end and see how the code…

    Read More
  206. 6 min read

    Testing Dolt using Bats

    We adopted Bash Automated Testing System (Bats) to test the Dolt command-line. As of March 10, 2020 we are up to 473 tests, though 55 are skipped because they currently fail. The tests define desired behavior so we're constantly working to get…

    Read More
  207. 7 min read

    Querying Historical Data with AS OF Queries

    Dolt is Git for data. It's a SQL database that lets you branch, merge, and fork your data just like you would a Git repository. In previous blog posts we announced how you can use special system tables to query the history of your database. Today…

    Read More
  208. 5 min read

    Novel Coronavirus Dataset in Dolt: A Case for Branches

    Here at DoltHub, we've been working on COVID-19 data since February 5, 2020. First, we started importing John Hopkins data and then we worked on assembling the largest open, regularly-updated set of case details from Singapore, Hong Kong and South…

    Read More
  209. 4 min read

    Scraping a JavaScript-enabled Website in 2020

    As part of our effort to track data related to the Novel Coronavirus (COVID-19), we wanted to scrape a JavaScript-enabled website on Coronavirus from Hong Kong. Moreover, you'll notice that the website from Hong Kong uses lazy loading based on scroll…

    Read More
  210. 6 min read

    Novel Coronavirus Dataset in Dolt: Case Details

    On Saturday, February 29, this transpired in our company chat room: A project was born. We had time series data for confirmed cases, deaths, and recoveries segmented by location sourced from John Hopkins but we did not have individual case data. We…

    Read More
  211. 7 min read

    How We Built DoltHub: Stack and Architecture

    In our introductory article for this series, we took a high-level look at the technology stack and architecture behind DoltHub, the online home for Dolt data repositories. In this article, we'll delve a little deeper and discuss how the pieces of the…

    Read More
  212. 7 min read

    Optimizing Sorted Map Iteration

    In this blog post I want to give an introduction to some core concepts used to implement fast querying of databases. These techniques were implemented in Dolt and produced significant performance improvements. Database internals The B-Tree is a core…

    Read More
  213. 10 min read

    So You Want Git for Data?

    People have been asking for a Git and GitHub for data for a while. That thread on Stack Exchange is almost seven years old and is the number three Google search result for "git for data" (for me). What is “Git for data” in practice? Many products…

    Read More
  214. 8 min read

    Visualizing Temperature Changes Over Time

    In the first part of this two part blog I covered NOAA's "Global Hourly Surface Data" dataset and how it is modeled in Dolt. Dolt is git for data, and for this dataset we model a day of observations as a single commit in the commit graph. In this…

    Read More
  215. 6 min read

    NOAA Global Hourly Surface Data

    The National Oceanic and Atmospheric Administration, NOAA, publishes weather measurements taken from stations around the world. It started in 1901 with a handful of stations, and there are more than 35,000 stations today. Most of these stations…

    Read More
  216. 9 min read

    Announcing Saved Queries

    Dolt is Git for data. We built Dolt to help teams collaborate on data sets using the forking, branching, and merging workflows that Git popularized. These workflows are what enable software engineers to collaborate on source code, and they're…

    Read More
  217. 4 min read

    Copyrightable Material

    In our previous blog post we examined some freely available licensing tools for open data from Creative Commons. To briefly recap a license specifies the terms under which copyrightable material is made available for public access, sharply distinct…

    Read More
  218. 3 min read

    Data Licensing

    Introduction Dolt is a data format. DoltHub is a collaboration platform for data stored in the Dolt format. When sharing copyrighted content the terms of that sharing are governed by a license. In this post we highlight some common licenses attached…

    Read More
  219. 10 min read

    Novel Coronavirus Dataset in Dolt

    John Hopkins University Center for Systems Science and Engineering began collecting, tabulating, and publishing Novel Coronavirus (COVID-19) data on January 31, 2020. We started importing this dataset into Dolt on February 5, 2020. This blog will…

    Read More
  220. 5 min read

    How We Built DoltHub: Introduction

    Towards the end of last month, we launched a totally reworked and redesigned version of DoltHub, our web application for hosting and collaborating on Dolt repositories. Now that we've had a little while to iron the kinks out, it seems like a good…

    Read More
  221. 3 min read

    Dolt and DoltHub Documentation

    Background We are excited to announce the launch of our documentation site. The goal of Dolt and DoltHub is to enable developers and the data community with radically better data infrastructure. High quality documentation should empower users by…

    Read More
  222. 13 min read

    Implementing indexed joins

    Happy Valentines Day from all of us at DoltHub! You are the reason we do what we do! In honor of the holiday, we want to talk about how much we love making queries faster. We're going to examine how our SQL engine makes a query plan and explain…

    Read More
  223. 5 min read

    LICENSE.md and README.md in Dolt

    Dolt and DoltHub strive to be the best data distribution platform on the internet. Having documentation versioned alongside data, and a standard, easy way to read the documentation online are features we admire in Git and GitHub. Following in Git's…

    Read More
  224. 8 min read

    Introducing SQL VIEW Support in Dolt

    Dolt is a SQL database with Git-style versioning and distribution. The most recent releases of Dolt introduced support for SQL views that are stored as part of, and versioned along with, a Dolt repository. This provides a great way for data sets to…

    Read More
  225. 9 min read

    Dolt and DoltHub: Getting Started

    Dolt is a SQL database with Git-style versioning. In Git the unit of versioning is files. In Dolt, the unit of versioning is SQL tables. Dolt will eventually support 100% of the Git command line and 100% of MySQL SQL. Moreover, anything you can do on…

    Read More
  226. 6 min read

    Mapping Income Inequality using IRS SOI Data

    In a previous blog I showed how the history of a dataset can be queried using the dolt history tables, and in the first part of this 2 part blog I covered the IRS SOI data. In this second part I use the IRS SOI data along with doltpy to map out…

    Read More
  227. 7 min read

    IRS Sources Of Income Dataset

    Every year the IRS publishes a treasure trove of data. It contains over a hundred different metrics which provide insight into the finances of American taxpayers. Even more compelling is they provide this information at ZIP code granularity, which…

    Read More
  228. 2 min read

    Querying DoltHub Repositories with SQL

    Since its launch in 2008, GitHub has catalyzed the open source software world and accelerated the culture of software collaboration. Source control was an old idea at that point, but GitHub offered a centralized place to discover and collaborate on…

    Read More
  229. 9 min read

    Access to Everything Through SQL

    When we started developing Dolt our vision was to deliver git functionality for data. Where git versions files, Dolt versions tables. We implemented table based diff and conflict logic and shipped the initial version. As we started to use Dolt to…

    Read More
  230. 5 min read

    DoltHub Redesign

    Redesigning DoltHub Dolt is a database and a data format. DoltHub is a way of hosting and collaborating on Dolt databases. We decided to redesign DoltHub to make it more user friendly. We are excited to announce that we have released the results of…

    Read More
  231. 7 min read

    Getting to one 9 of SQL correctness in Dolt

    A few months ago we finally settled on a good way to measure the correctness of Dolt's SQL engine: the sqllogictest package, first developed for SQLite and since used as a benchmark for lots of other database implementations. SQLite hit upon the…

    Read More
  232. 6 min read

    The History of Data Exchange

    IBM and General Electric invented the first databases in the early 1960s. It was only by the early 1970s that enough data had accumulated in databases that the need to transfer data between databases emerged. Enter the Comma Separated Values (CSV…

    Read More
  233. 6 min read

    Maintained Wikipedia ngrams dataset in Dolt

    Wikipedia is the largest and most popular general reference work on the internet, making it a powerful tool for predictive language modeling. Wikipedia releases a dump of all its articles and pages twice a month, and we created a dataset of ngrams…

    Read More
  234. 6 min read

    2 billion primes in a Dolt table

    Since releasing Dolt, we have often been asked how it scales. How many rows and how many gigs can you get into a Dolt dataset before things start breaking badly? Answering this question in practice is kind of difficult, simply because it's…

    Read More
  235. 2 min read

    No Food, One Problem. Have Food, Many Problems.

    I have been a huge Econtalk fan for over ten years. On his podcast with Sebastian Junger, Russ Roberts brought up what he called a Chinese proverb. No food, one problem. Have food, many problems. The wisdom of this saying really resonated with me. We…

    Read More
  236. 4 min read

    ImageNet in Dolt

    ImageNet is a dataset maintained by the Stanford Vision Lab. It seems to have fallen into disrepair. The links to download the image labels are broken. We have managed to procure all four released versions of the labeled images and import them into…

    Read More
  237. 8 min read

    Tracking Data Changes with Dolt Blame

    Ever look at some data and wonder where a particular value came from, how long it's been there, or what the reason for changing it was? This is important information, but current data storage formats don't track or expose it—certainly not in a…

    Read More
  238. 3 min read

    Dolt: A Database with Branches

    As we discussed in the Where Is the Data Catalog? blog post, Dolt is a database designed for internet-scale collaboration. There are databases with differences, history, rollback, and audit logging. We think the Git semantics of Dolt provide these…

    Read More
  239. 6 min read

    Testing Dolt's SQL Engine

    When we first started writing Dolt, we weren’t thinking about SQL functionality. We just knew we wanted a way to package data sets to make them easy to share, collaborate and merge -- to do for data what git did for source code. But as we demoed…

    Read More
  240. 5 min read

    WordNet in Dolt

    The Princeton WordNet database is on DoltHub. This blog entry will be about how it got there and how to use it. WordNet is distributed natively from Princeton as a compilable custom database. You can also download the database files only but they are…

    Read More
  241. 4 min read

    Dolt: A Simple Example

    When Dolt and DoltHub first went into private beta, we were surprised that the Iris dataset was the dataset people first tried to put in Dolt. If you are looking for that dataset, we have uploaded it to DoltHub. In this article, we're going to show…

    Read More
  242. 4 min read

    Where Is The Data Catalog?

    Why is there no place on the internet to get useful, maintained data? This question has puzzled me since 2013. We can rent a server. We can rent a database. Why can't we rent the data in the database? Something like that would be extremely useful. It…

    Read More
JOIN THE DATA EVOLUTION

Get started with Dolt