We finished our second data bounty Monday, March 1. The target of the bounty was hospital prices. The results surpassed our expectations.
We built a database of 1,400 of approximately 6,000 US hospital's chargemasters, representing over 72.7M prices, all in the same SQL schema. The bounty cost $10,000 and ran for six weeks. Bounties are the cheapest, fastest way to wrangle a dataset into existence.
Dolt is a SQL database with Git-style versioning. It's the first SQL database you can branch and merge. DoltHub is a place on the internet to share and collaborate on Dolt databases. Without both, data bounties would not be possible. Bounty participants submit their proposed changes back to the repository with Pull Requests, just like code changes on GitHub. This makes it possible for over a dozen people to sanely collaborate on the same dataset at once. Dolt and DoltHub radically reduce the cost of collaborating on databases.
The resulting database is 19.4GB. That's a lot of prices. To get a copy of the database, install Dolt and run
dolt clone dolthub/hospital-price-transparency. You'll have a full copy of the database and its history.
1400 hospitals, 3,287,818 CPT or HCPCS codes, and 72,724,852 prices are represented in the database. 79 Pull Requests (PRs) from 14 bounty participants were accepted. Since there are just over 6,000 US hospitals, this means we gathered price data for about 23% of them in six weeks, at a cost of about $7 each. The top participant earned $5,318 by contributing 158,223,996 cells.
We received more participation than expected so we rounded the minimum bounty payment up to $50 so every participant got something. We think this is a good model to consider moving forward. Lots of checks to mail!
We can confidently say that this is the best open US Hospital Price database available. We can't wait to see what people do with it.
How can this database be used?
Since we just finished this bounty Monday, we haven't had much time to do analysis of the data. We'll have some follow up blog posts with analysis over the next few weeks. This database can be used to answer questions like:
- What hospitals are complying with the federal chargemaster regulation?
- What is the cheapest/most expensive hospital?
- Which insurer gets the best deals?
- How do prices vary across procedure within a hospital? ie. Are hospitals uniformly expensive for all procedures or expensive for some procedures and not others?
DoltHub Bounties are the fastest, cheapest way to build databases from open data.
Our plan is to run at least one bounty per month for the rest of the year across a number of distinct data disciplines. We're running a US Course Catalog Bounty right now. We'll be launching one more over the next week or so. Hang out in our Discord to keep up to date. Start wrangling data as your new side hustle.