US Hospital Price Transparency Bounty Retrospective
A year ago we launched DoltHub data bounties. If you're unfamiliar, we use Dolt to create open databases that don't exist anywhere else on the internet. We use Dolt's data lineage capability to figure out who contributed what, then we pay our contributors proportional to how much data they add.
In 2019, the Trump administration forced hospitals to make their price lists public under penalty of a nominal fine.
Law and disorder
But hospitals are allowed to make their chargemasters public under a list of difficult-to-enforce requirements.
- Chargemasters (price lists, or "CDMs") must be public, but that doesn't mean they have to be easy to access. Many CDMS are buried several clicks deep, or mixed in with other similar links
- CDMs must be machine-readable, but in practice, many of them take effort to read. And, naturally, every hospital has its own format, making comparison and bulk collection tricky
- CDMs must contain common codes, but often don't
This is ignoring the fact that many hospitals simply don't publish their CDMs because the cost of noncompliance is so low. We'll look into just how many of our scraped hospitals are actually compliant.
The largest open price table ever, probably
We seeded our hospital table with a list of hospitals from 2020 pulled from UNC. ]
Of these ~6000 hospitals, our participants got the chargemasters of ~1800 of them, a 30% coverage rate.
And from these 1800 hospitals we collected just under 300 million prices, an average of 166k per hospital. This explains why shopping is so hard. With so many charge codes, prices, and payers, it's almost impossible for the average patient to know what to look for. We'll address this in some upcoming analysis blogs.
Paying per hospital worked
We generally pay per table cell, but that doesn't work for every bounty. In this case, some hospitals provide over 100x the amount of cells, even though scraping them takes the same amount of effort. Having millions of prices from a handful of hospitals doesn't serve our mission. We want a dragnet so that we can compare prices between hospitals.
With 1800 hospitals, we nearly doubled the number we captured in the previous tries, and tripled the number of prices we got.
The bounty lost steam when the only hospitals left were the singlets with unique table schemas. It can take 30 minutes or more to import a singe CDM, making it unprofitable to work on.
A weakness of our dataset comes from the arcane billing procedures that hospitals have. There are internal billing codes, generic codes, and prices for each payer and each code.
- Despite "common payer identifier" being a CMS requirement, many codes were unique to the hospitals
- No standardized coding exists for the payers, making hospital-to-hospital comparisons hard
In some upcoming analysis blogs, we'll see what we can do to normalize the data a little and add some precision.
9 participants contributed here, and once again @abmyii earned the top prize for an incredibly diligent and thorough effort.
Stay tuned for more
We have more coming where we take a closer look at the housing data. In the meantime if you have questions or want to participate in our current bounty (we're collecting data from local jails), let us know. We're active on Discord and ready to answer all your questions.
If you want to investigate the data for yourself, you can. Visit the database and click the clone button to download Dolt and make a local copy.