Announcing $10,000 US Schools Bounty

2 min read

It's time for another data bounty. Today, we're launching a $10,000 bounty to collect basic identifying data about schools in the United States. We seeded the database with California K through 12 Schools. Only 49 states (and the District of Columbia) left to go!

Our last data bounty ended early because we busted up Dolt pretty bad. The search space was really large and we confounded things by requiring one pull request per restaurant. Dolt got pretty ornery with tens of thousands of commits. This time we're going for a smaller search space.

Schools is a generally useful dataset that many users asked for in the past. The data is available from vendors but no consolidated, open database exists. The database will include public and private kindergarten through 12 schools as well as post secondary schools.

The Schema

Just one table in this bounty.

us-schools $ dolt schema show
schools @ working
CREATE TABLE `schools` (
  `name` varchar(255) NOT NULL,
  `city` varchar(255) NOT NULL,
  `state` varchar(255) NOT NULL,
  `address` varchar(255),
  `zip` int,
  `website` varchar(255),
  `category` varchar(255),
  `public_private` varchar(255),
  `district` varchar(255),
  `lat` float,
  `lon` float,
  PRIMARY KEY (`name`,`city`,`state`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Most of the columns should be self explanatory. category is Elementary, Middle, High, Post Secondary, or K-12. Leave it NULL if you don't know. We're willing to entertain new categories if need be. Feel free to propose one in your PR.

state is the two letter state abbreviation.

The minimum required information for an accepted PR is name, city, state, and address.

Example

In order to seed the database, I started with my local elementary school, Roosevelt Elementary, in Santa Monica, CA. I manually collected the information from the school's website and ran inserts and updates on DoltHub.

Queries on DoltHub was a great way to get started but to accumulate the big edits, you need to script. I found the California Department of Education Website list, downloaded the data I needed as Excel, saved it to CSV, and then wrote a simple Perl script to parse it into the form I needed. I outputted the results to a CSV file and used dolt table import to get that into Dolt. You can see the results in this diff.

Conclusion

We're going back to basics with this bounty. Grab as much school data as you can. Come discuss on our Discord if you have any questions.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt