PDAP are early Dolt adopters. Their open data mission aligns with DoltHub's and we're sponsoring a bounty to help get their project off the ground. This post goes through the mission, the bounty and ways you can contribute.
PDAP has a problem to solve, and they articulate it better than I could (the following is pasted from PDAP homepage):
"Local law enforcement data is hidden in the corners of the internet, obfuscated by bureaucracy, and only accessible via low quality user experiences. It's difficult for data scientists, journalists, and ordinary citizens to access, consolidate, and use the data. The simple act of collecting the data in one place creates an unprecedented starting point for full-scale analysis of our criminal justice system.
Our goals are to ease data collection, standardize formats from disparate sources, store the data to archival standards, and facilitate open source software analytics.
We speculate that the key audience is data scientists and journalists. They do the analysis, and are the critical channel for information to flow from police organizations to policymakers and the general public. The platform will also benefit broader swathes of the population, such as academics, government oversight bodies, elected officials, and the law enforcement agencies themselves."
Dolt has been open source from the start, and will stay that way. Dolt databases offer the right tooling for just about anybody to contribute to PDAP's project. The PDAP datasets database is hosted for free on the web for the public to query, clone, or fork and contribute. With forks and branches, multiple contributors can make revisions to the same database. We are excited to sponsor this bounty for The Police Data Accessibility Project, as we share a vision of an open source data world where anyone can contribute to or analyze robust and public databases.
PDAP would like to collect URLs, and some metadata about each URL, for all 15,219 police agencies in the United States (scraped from Wikipedia). Later on, these url's will be scraped for their data and also imported. But as a prerequisite to that next step, PDAP is building a catalog of all the URLs that should be scraped.
This bounty is slightly different from the ones we've run in the past in that participants will have the opportunity to enter data manually into Dolt, rather than have to write complex web scrapers. It is also the first bounty that the DoltHub team will administer within another organization on DoltHub (our bounty admin tools are internal for the time being).
The table of interest is datasets, and only three columns are required for entry (
agency_id). Refer to the bounty page for more information on the required fields and acceptance criteria.
To understand what exactly constitutes a dataset URL, refer to PDAP’s Examples and Best Practices. In short,
datasets.url is a link to a police, court or third-party dataset that can be directly tied to an agency.
How to contribute
- Install Dolt CLI if using workflow #2 below.
- Create a DoltHub account if you haven’t already.
- Fork pdap/datasets to your DoltHub account.
PDAP has graciously prepared two workflows for users to contribute:
The first method has users manually collect data in a spreadsheet. The spreadsheet will autogenerate INSERT statements that can be run in the SQL Editor of the participants fork.
The second workflow requires participants to install Dolt locally, from which you clone down your fork, export a CSV, and add your changes. That CSV will then be reinserted, with the
id column autogenerated.
Head over to DoltHub and start contributing! There is $5,000 up for grabs, so go get what's yours! We hope to see submissions from new and old contributors, Dolt and/or MySQL newbies, and those passionate about PDAP's mission. Join us on Discord in the #data-bounties channel if you have any questions or want to say hi.