Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic.
For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL tables. DoltHub is a place on the internet to share Dolt repositories.
We think the way we share data with each other is broken and we think Dolt is the fix. Whenever you see a link to a CSV, JSON, or XML file, you should think of Dolt. Whenever you see an API but want all the data, not just a few entries, you should think of Dolt. We are working hard to move data shared in these formats to Dolt. This series of blogs will update you on our progress.
NBA Player Statistics
First Published: May 9, 2020
We scraped the NBA API to get all season and career statistics for every NBA player who ever played. It took us a few days but you can get the with Dolt in a few minutes.. We even have a branch that counts James Harden's disallowed dunk. Sports data is often the entry point for many aspiring data scientists. We are committed to providing a great collection of sports statistics on DoltHub.
First Published: May 19, 2020
Following tha above sports data theme, we also imported all NFL play-by-play data since 2000. The NFL recently stopped the API serving this data so feel free to use Dolt. This data can be used to predict plays or visualize play-calling tendencies.
Casos COVID Mexico
First Published: April 30, 2020
Our first foreign language repository! This repository exposes COVID-19 case data in Mexico. The documentation is in Spanish but the language of data is universal. Thanks to
jccpmx for being an early adopter.
Corona Virus Stimulus
First Published: May 13, 2020
This is a very interesting dataset of US companies that received COVID-19 stimulus funds from the US Federal government. Which companies received stimulus funds? Do you think the money was well allocated? Join the data with our stock market dataset and see if there is any correlation to stock price over the past few weeks.
IRS Statement of Income
First Published: January 22, 2020
We wrote a really popular blog post using this dataset back in February. This dataset contains summary income tax data from 2011 to 2017. The data is cleaned and the columns are well named. It is much easier to work with than pulling the data yourself.
That's it for this month. As you can see, most of the datasets are published by us. For Dolt and DoltHub to continue to exist, we need a community of data publishers to emerge. Please help us build a community by publishing. We published a blog on how to publish with SQL and another on how to publish CSVs.
That said, if you want data in Dolt format but don't have the time or expertise to import and maintain it, send us a note. We're happy to be an open data provider for your projects.