National Course Catalog $10,000 Database Bounty

4 min read

It's time for another data bounty! We completed the US Presidential Election Precinct Results bounty and we have a week or so left in the Hospital Price Transparency bounty.

For the next bounty, we want to build a database of US College Course catalogs. The price on this data is $10,000. The bounty will run until March 18.

This bounty is a little different

Unlike previous bounties, the data generated by this bounty will become private once the bounty ends. This is a new one for us. People with which we have a previous business relationship came to us wanting to use bounties to collect a private dataset. They are fronting the prize money for the bounty. We mulled it over, discussed a few options, and decided to give it a try in this form.

The dataset will be public and open for the duration of the bounty. We thought this would encourage the most participation. Once the bounty ends, this repository will become private. We will delete all the public forks on DoltHub. If we find a copy of the data posted on DoltHub post bounty, we’ll take it down.

We understand this is different in spirit to previous bounties. If you choose not to participate, we understand. We’ll have a new open data bounty in the next couple weeks.

We believe that the more bounty money available, the better. People trying to make bounties a legitimate side hustle benefit even if the data is not open. If it works, we’ll do more. If it doesn’t we’ll rethink the model. This is all new to us and we’re willing to try things that may not work.

The Database

The database consists of 4 tables colleges, subjects, courses, and sections. The README does a really good job of explaining the tables so we won’t repeat that here. It's a fairly complicated schema so feel free to hit us up on Discord if you have any questions.

Contributing

I wanted to insert an example from Duke University's course catalog. Here is the process I followed to do that.

I started out by googling “Duke University Course Catalog” and after following a few links, I found myself here. This url is what I ended up population in the catalog_url field for Duke University in the colleges table. I picked a term (2021 Spring Term) and a subject (Computer Science) and hit search. The first result is “Introduction to Computer Science | COMPSCI 101L”. Below are links to 4 dolt queries that show how I populated the first listed section of this course, “001-LEC (6673)”, into the bounty database.

I open sourced the script to help. You can look at the script here.

Conclusion

Bounties are a great way to earn a little extra money data wrangling. Fork the bounty repository today and start contributing. When you have some data committed, make a Pull Request.

This bounty is a little different. The data will not be open once the contest completes. We will be running a bounty approximately once per month. Come by our Discord if you have any questions.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt