April Dataset Spotlight

2 min read

It's that time for our April dataset spotlight here at DoltHub. For new folks, Dolt is a SQL database with git-like versioning and DoltHub is a place on the internet to share Dolt databases. This monthly feature keeps you updated on Data Bounties and popular Dolt databases.


We are excited to continue updating you about our progress on Data Bounties. We have one active data bounty. We completed one bounty in April.


Link: dolthub/logo-2k-extended
Bounty: $10,000
End Date: April 9, 2021

We wanted to build a database of corporate logos to use in Machine Learning applications. There has been some clamouring in the ML community for more open datasets. This is our attempt to help.

The logo bounty was a success. The bounty participants collected 3.4M logo URLs. The top contestant earned over $6000.

Hospital Price Transparency V2

Link: dolthub/hospital-price-transparency-v2
Bounty: $10,000
End Date: May 26, 2021

Our first hospital bounty was inspiring, collecting over 72M prices. This time we corrected some schema errors, imported the old data, and asked bounty participants to have at it. Either get new hospitals or go through the ones we already have and update the proper code descriptions. We changed the schema so that code descriptions are now per hospital, as opposed to assuming the codes are the same across all hospitals. We think we got the schema right this time.

Popular Datasets

The five most viewed DoltHub datasets for the month of April:

  1. dolthub/hospital-price-transparency-v2
  2. pdap/datasets
  3. dolthub/nfl-play-by-play
  4. dolthub/hospital-price-transparency
  5. dolthub/logo-2k-extended

The Police Data Accessibility Project (PDAP has been very active on DoltHub and on our Discord. They are collecting data from approximately 18,000 police organizations in the US. It's a very cool initiative. Expect to see more from them in this space.


That's it for this month. Interested in participating in data bounties? Come say hello on our Discord and be a part of our data community.



Get started with Dolt