Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic.
For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL tables. DoltHub is a place on the internet to share Dolt repositories.
We think the way we share data with each other is broken and we think Dolt is the fix. Whenever you see a link to a CSV, JSON, or XML file, you should think of Dolt. Whenever you see an API but want all the data, not just a few entries, you should think of Dolt. We are working hard to move data shared in these formats to Dolt. This series of blogs will update you on our progress.
NFL Pick'em Data
First Published: September 2, 2019
Shout out to our first publisher! This user has been putting NFL Pick'em data into a Dolt database for the 2019-2020 NFL season and he has returned for the 2020-2021 season. We're glad to see him back. This dataset contains information about games and scores of NFL games with a bend towards picking winners for gambling purposes.
USDA All Foods
First Published: September 30, 2019
The US Department of Agriculture maintains a list of food products and various nutritional information about them. The database is divided into raw foods like broccoli and hamburger and branded foods like Doritos. It's a good database for anyone looking into food or food products. This dataset has been recently requested on Reddit so we thought we'd point ity out here.
Contributor: dolthub/Open Elections Partnership
First Published: June, 2020
This is the first dataset where we are working with the data producer to build a version of the dataset in Dolt format. Here's a blog article about the data. We're going to keep building on this work over the next few months so stay tuned.
NOAA Weather Data
First Published: March 2, 2020
This dataset is an interesting one. You can read about it in this cool blog article complete with visualizations. What's interesting about it is the way that it is modeled in Dolt. This dataset contains the current climate information at a station on
HEAD and uses the commit log to handle historic information. We're not sure if that's the best way to model time series information in Dolt but it was cool to try.
USPS Crosswalk Data
First Published: July 13, 2020
The USPS Crosswalk data is a tool provided by the United States Postal Service for mapping from ZIP codes to other kinds other geographic entities such as counties. It allows users of the data to change the spatial resolution of an existing dataset. We find this kind of data exciting precisely because it is not specifically interesting, but is rather generically useful. For an example of what we mean by that, we took a historical analysis of the IRS Sources of Income by ZIP code data, and transformed it to the country level. You can read about how we used USPS Crosswalk data to do that transformation here.
That's it for this month. As you can see, most of the datasets are published by us. For Dolt and DoltHub to continue to exist, we need a community of data publishers to emerge. Please help us build a community by publishing. We published a blog on how to publish with SQL and another on how to publish CSVs.
That said, if you want data in Dolt format but don't have the time or expertise to import and maintain it, send us a note. We're happy to be an open data provider for your projects.