October Dataset Spotlight

3 min read

Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic.

For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL tables. DoltHub is a place on the internet to share Dolt repositories.

We think the way we share data with each other is broken and we think Dolt is the fix. Whenever you see a link to a CSV, JSON, or XML file, you should think of Dolt. Whenever you see an API but want all the data, not just a few entries, you should think of Dolt. We are working hard to move data shared in these formats to Dolt. This series of blogs will update you on our progress.

NBA Player Statistics

Link: dolthub/nba-players
Contributor: dolthub
First Published: May 5, 2020

The LA Lakers won the NBA Bubble championship. As LA residents, we at DoltHub congratulate our team. Moreover, we updated the NBA player statistics dataset with the NBA Bubble statistics. Also, thanks to the user jacob who added a draft history table using our new fork and pull request feature. Distributed data collaboration is coming, we can feel it.

Supreme Court Cases

Link: dolthub/us-supreme-court-cases
Contributor: dolthub
First Published: April 23, 2020

The Supreme Court has been in the news a lot lately. We have a supreme court cases transcript dataset complete with data on the justices. We made a Pull Request with Ruth Bader Ginsburg's passing and we made another for Amy Coney Barrett. The Diff and Pull Request workflow is on full display in this dataset.


Link: bblank/math
Contributor: bblank
First Published: October 11, 2020

What is it about prime numbers? First Zach put the first two billion primes in a Dolt repository. Then bblank shows up with bigger ambitions, "Math database for storing results of complex calculations.", and does the exact same thing. We'll see what bblank comes up with next.


Link: emekaboris/EndSars
Contributor: emekaboris
First Published: October 17, 2020

End SARS is a movement to stop police brutality in Nigeria. SARS stands for "Special Anti-Robbery Squad", a particularly notorious arm of the Nigerian police. This dataset contains all the tweets with the #EndSARS hashtag. We're glad to see Dolt being used to help social movements all across the globe.

Cloud Native Computing Foundation

Link: cncf/landscape
Contributor: cncf
First Published: September 24, 2020

The Cloud Native Computing Foundation published a dataset of interesting cloud technologies. The dataset is a cool place to start if you are interested in a good survey of cloud technologies, whether they are open source, and how to contact the owners.


That's it for this month. For Dolt and DoltHub to continue to exist, we need a community of data publishers to emerge. Help us build a community by publishing. We published a blog on how to publish with SQL and another on how to publish CSVs.

That said, if you want data in Dolt format but don't have the time or expertise to import and maintain it, send us a note or chat with us on Discord. We're happy to be an open data provider for your projects.



Get started with Dolt