December Dataset Spotlight

3 min read

We have been running the DoltHub dataset spotlight since May 2020. This is our eighth issue. The intent was to add additional exposure to Dolt datasets published on DoltHub.

Publishing this blog monthly has presented some challenges content-wise. In May, we were spending a lot of company resources tracking down open data on the internet, turning it into a Dolt database, and publishing it on DoltHub. It made sense to call attention to that work monthly. Over the summer we started to focus more on making Dolt a credible database and less on the data distribution use case that open datasets show off. The content in these monthly dataset spotlights suffered. For a few issues, our users picked up the slack and we published good content. But other months putting together this blog was a slog.

We think a new day is here! In December, we launched a new feature on DoltHub called data bounties. Bounties allow for DoltHub users to pay other DoltHub users to source and clean data. For now, DoltHub is acting on the buy side of the bounties. We think it's a clever way to bring open data to DoltHub. We're going to augment this monthly spotlight to include all active bounties. We think these are going to be the coolest datasets on DoltHub and users can get paid to contribute to them.


US Presidential Election Precinct Results

Link: dolthub/us-president-precinct-results
Bounty: $25,000
End Date: February 14, 2021

The goal of this bounty is to create a database of US presidential precinct-level results from 2016 and 2020. The bounty started December 14, 2020 and runs to February 14, 2020. You can see the latest update here. Plenty of work left to do.

Popular Datasets

The five most viewed DoltHub datasets for the month of December:

  1. dolthub/us-president-precinct-results
  2. alexis-evelyn/presidential-tweets
  3. dolthub/corona-virus
  4. dolthub/nfl-play-by-play
  5. dolthub/fbi-nibrs


Check out all the latest DoltHub datasets on the Discover page. We chose to cover one this month.

Pennsylvania Mail in Ballots

Link: dolthub/pa_mail_ballots_2020
Contributor: dolthub
First Published: November 6, 2020

In November 2020, we grabbed the Pennsylvania mail in ballot data from the state's website in order to attempt to debunk some election fraud claims. The state promptly put the data behind a login wall and DoltHub became the only source of the original data. Well, the state has restored access to the data and there is a new version. At the request of a user, we updated the dataset to the new version. Here is the diff.


We have high hopes that data bounties will provide interesting monthly content for the DoltHub dataset spotlight. We will have another bounty starting in the next couple weeks. Come say hello on our Discord and be a part of our data community.



Get started with Dolt