Hospital Price Transparency V2 Bounty Retrospective

5 min read

Last Wednesday, May 27, we completed our fifth data bounty and first V2 bounty. The focus of the bounty was US hospital prices. We had run a bounty for hospital prices that ended March 1. We loved the results and wanted to see what we could do with a second round.

The bounty was a success. We built a database of 100M prices across 3.3M procedures. This is up from 72M prices in V1. This time, we have hospital specific descriptions for each procedure, correcting a major flaw in V1. There were no new hospitals added so this bounty was reinserting data from V1 in the correct schema. The more complicated schema and repeat theme seems to have limited participation but we still got a good database to start with.

Thanks to everyone who participated. Data Bounties continue to inspire us. The dream of a world full of collaboratively built databases is near at hand.

Why V2?

We weren't happy with the schema we chose for V1. Specifically, we wanted a description per procedure and more schemaed data about each price.

In V1, we assumed each hospital would have a relatively similar description of a procedure for each CPT or HCPCS code. This was a bad assumption. When a price is wildly different for a certain code, is the code describing the same procedure? In V1, there was no way to tell.

hospital-price-transparency $ dolt sql -q "select * from cpt_hcpcs where code = '00450'"
+-------+----------------------------+------------------+
| code  | short_description          | long_description |
+-------+----------------------------+------------------+
| 00450 | ANESTH SURGERY OF SHOULDER | NULL             |
+-------+----------------------------+------------------+
hospital-price-transparency $ dolt sql -q "select * from prices where code = '00450' order by price desc"
+-------+------------+---------------------------------------------------+----------+
| code  | npi_number | payer                                             | price    |
+-------+------------+---------------------------------------------------+----------+
| 00450 | 1255572228 | CASH                                              | 14091.75 |
| 00450 | 1255572228 | Amerigroup                                        | 14091.75 |
| 00450 | 1194766543 | BEACON HEALTH OPTIONS, INC. - (POS)               | 3803.19  |
| 00450 | 1184654923 | CASH                                              | 3554.57  |
| 00450 | 1184654923 | Healthsmart                                       | 3554.57  |
| 00450 | 1184654923 | Other HPs                                         | 3554.57  |
| 00450 | 1184654923 | BeechStreet                                       | 3554.57  |
| 00450 | 1992736599 | CASH                                              | 2734.50  |
| 00450 | 1992736599 | Healthsmart                                       | 2734.50  |
| 00450 | 1134186315 | SYMETRA LIFE INS. CO., BELLEVUE, WA - (Indemnity) | 1390.20  |
| 00450 | 1063411767 | TEXAS MEDICAID/HEALTHCARE SERVICES - (Medicaid)   | 1214.93  |
| 00450 | 1194766543 | CASH                                              | 1115.62  |
| 00450 | 1609297506 | CASH                                              | 1050.00  |
| 00450 | 1194766543 | LUCENT HEALTH SOLUTIONS - (PPO)                   | 952.70   |
| 00450 | 1134186315 | CASH                                              | 695.10   |
| 00450 | 1063411767 | CASH                                              | 483.19   |
| 00450 | 1194766543 | FIRST CHOICE HEALTH - (PPO)                       | 315.00   |
| 00450 | 1033112230 | CASH                                              | 122.00   |
| 00450 | 1326115569 | CASH                                              | 20.50    |
| 00450 | 1093879439 | CASH                                              | 20.50    |
| 00450 | 1346590882 | CASH                                              | 20.50    |
| 00450 | 1437262763 | CASH                                              | 20.50    |
| 00450 | 1821151366 | CASH                                              | 20.50    |
| 00450 | 1821298159 | CASH                                              | 20.50    |
| 00450 | 1073568978 | CASH                                              | 17.50    |
| 00450 | 1518164037 | CASH                                              | 17.00    |
+-------+------------+---------------------------------------------------+----------+

How can the same procedure cost $14,000 at one hospital and $17 at another. Are we talking about the same procedure?

In V2, we can answer this question because the procedures table has the NPI number as an additional key.

hospital-price-transparency-v2 $ dolt sql -q "select * from procedures where code = '00450' and (short_description is not NULL or long_description is not NULL)"
+-------+------------+-------------------------+------------------------------+
| code  | npi_number | short_description       | long_description             |
+-------+------------+-------------------------+------------------------------+
| 00450 | 1073568978 | ORIF CLAVICLE           | NULL                         |
| 00450 | 1639399793 | ANESTHESIA UNIT X 15MIN | NULL                         |
| 00450 | 1043627474 | NULL                    | ANESTH,SURG CLAVICLE/SCAPULA |
| 00450 | 1821035940 | NULL                    | ANESTH,SURG CLAVICLE/SCAPULA |
+-------+------------+-------------------------+------------------------------+

Looks like this code can be used for the surgery or anesthesia for the surgery. This pattern happens everywhere and makes the data hard to compare across hospitals. For V2, if you notice wildly different prices, you can ask for each hospitals description of the procedure. Unfortunately, in V2, we only got four hospitals' prices for this procedure. We definitely got more hospital's data in V1 but the data is of lower quality.

We added more metadata about each price, specifically IP_OP (inpatient/outpatient), caveat, and mode. This allows for additional query capability. For instance, querying the max price for a specified procedure as required by law in the posted hospital charge masters.

hospital-price-transparency-v2 $ dolt sql -q "select * from prices where code = '00450' and mode='MAXIMUM'"
+-------+------------+-------------+------------+--------+---------+--------+
| code  | npi_number | payer       | IP_OP      | caveat | mode    | price  |
+-------+------------+-------------+------------+--------+---------+--------+
| 00450 | 1043627474 | UNSPECIFIED | INPATIENT  | MISC   | MAXIMUM | 103.00 |
| 00450 | 1043627474 | UNSPECIFIED | OUTPATIENT | MISC   | MAXIMUM | 103.00 |
| 00450 | 1821035940 | UNSPECIFIED | INPATIENT  | MISC   | MAXIMUM | 103.00 |
| 00450 | 1821035940 | UNSPECIFIED | OUTPATIENT | MISC   | MAXIMUM | 103.00 |
+-------+------------+-------------+------------+--------+---------+--------+

How did we do?

This bounty was a little more difficult to participate in. As a result, we got less data but it is better data. In V1 we got 72M prices. In V2 we ended up with over 100M but that includes the 72M prices in V1. No new hospitals were added for V2. We think a couple factors contributed to this.

We had some Dolt scale issues during the bounty. The database ended up at 165GB. At this scale, some writes and inserts were experiencing a deadlock that we ended up finding and fixing mid-bounty. We continue to make improvements to Dolt because of real life use cases like bounties. We're getting better but Dolt performance definitely limited participation in this bounty.

Also, a more complicated schema may have hindered participation. We're testing the limits of what prize earning bounty hunters are willing to undertake with each bounty. We've always been surprised to the upside. But V2's more complicated schema required much more parsing and data manipulation to fit into Dolt. We ended up having the same top prize earner from the first bounty take the lion's share this bounty by reinserting their previously scraped data in this new, better schema.

What's next?

We have learned a lot about US hospital prices over these past two bounties. We will happily run more bounties on this dataset to get more data if there is interest. We need to get people analyzing the data to start a feedback loop about what they would like to see. Are more hospitals needed? Should we focus on a specific region? Should we trim the data to only prices that are required by regulation to achieve better data quality? All these questions would be framed by a serious analytical effort. We are here to support an analytical effort on this data!

One thing we know we need to do, that we attempted prior to running this bounty but failed to finish, was to standardize payers. Right now the payers table mirrors what is in the hospital charge masters. This makes it hard to answer a question like "which insurer gets the best rates on average?" With some post processing, a clever analyst could group insurers by text distance and come up with a reasonable estimate. But, we think that should be handled in the database itself.

Conclusion

Bounties continue to inspire us. Bounties show off the power of a database designed for collaboration. We built two pretty cool hospital price databases for people to play with.

If you'd like to participate in bounties, we are running a bounty in partnership with the Police Data Accessibility Project right now. This one requires a lot less scraping expertise so it is more accessible to beginners. If you want to talk to us about bounties or Dolt, come join us on our Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt