Open Source Hospital Price Transparency

BOUNTY
4 min read

This work was done with help of Dr. Jaan Altosaar at One Fact Foundation. You can reach him at jaan@onefact.org.

We just built a data bank of hospital price lists: lists that are supposed to contain all of a hospital's prices. To our knowledge, this doesn't exist anywhere else. It's free to use under Creative Commons. You can jump straight to the database, look at our compliance analysis on Github or Google CoLab, or read on for more context.

Background

Since 2021, hospitals, by law, have had to post a machine-readable price list with all the procedures they offer. The idea was to create a set of files with the same basic information: generic codes for labeling procedures, and prices indicating who pays. This data bank would, in theory, give you a way to price shop. This didn't exist before, because before 2021 hospitals could legally their prices secret.

In the two years since, disclosure of these price lists has been hit and miss. Some hospitals posted partial price lists, others none at all. (They were probably counting on not getting caught.) Two hospitals fined over $1M combined in 2021 for refusing to host these files (but since the penalty, have since taken a U-turn and published their prices.) This might have been to send a message to the other hospitals to get serious.

In 2022, do hospitals actually publish their price lists? To know that, we’d have to scour the webpages over over 7,000 hospitals in the US. Then we could figure out what fraction meet the compliance standards.

So that’s what we did.

We're just about done building an open database of hospital price lists after launching a weekly data bounty with funds from One Fact and our own marketing budget. At the moment, it’s the only such open database that exists.

One Fact to feed these files into their artificial intelligence pipeline and figure out how much hospitals charge for different procedures to make price shopping easier through their platform, Payless.Health. The One Fact Foundation is a 501c(3) nonprofit.

But we can also use these URLs to get a quick check on how many hospitals are actually in compliance and reproduce some improve actual published research.

compliant_cdm Calculated transparency scores for price lists. Many were mostly compliant, but a large number either didn't contain the right information, or weren't machine readable.

Above are the calculated "transparency scores" for a sample of the ~500 prices lists in our database. We can use this to point to hospitals that are out of compliance.

To skip straight to the code, click here to open a Google CoLab notebook with our results.

Checking hospital price list compliance

The legal requirements mandate the file to:

  1. exist, and be available on the hospital's website
  2. be machine-readable (i.e. not .pdfs)
  3. follow a naming convention: EIN_hospitalname_standardcharges[.xlsx|.json|.csv]
  4. contain some basic information (cash prices, gross charges, insurer prices, minimum and maximum negotiated rates among insurers, etc.)

compliant_cdm

Screenshot of a compliant hospital price last that contains all the relevant information (underlined in red). The hospital: Palos Community Hospital, CMS Num 140062, chargemaster here for reference.

noncompliant_cdm Example of a noncompliant price list. Hospital: Vantage Point of Northwest Arkansas, CMS Num 44004, chargemaster here for reference.

A JAMA study looked at compliance and, when we asked, generously provided their data. However,

  • they don't provide the URLs they used (they cited the fact some of the URLs may have changed since the study). Problem: the work isn't reproducible. This is extra concerning since for more than 300 URLs they marked "compliant", we don't have any record of them in our database (at least, not yet)
  • they had a limited view on compliance (marked "compliant" if the hospital had any privately negotiated rates, else non compliant) Problem: price lists can be compliant or non-compliant in more than one dimension. (In particular, they must be machine readable, which was not one of their criteria)

Not only can we do better than this, you can help, because we made our work into a Google CoLab notebook that's runnable by anyone.

Our strategy was to read in the heading information in each file. For any given dimension, if it contains the right magic strings, we label the price list compliant in that dimension. That leaves us with a score we can use to grade hospitals with.

For example,

def generic_code_check(chunk: str) -> int:
    """Checks if the chunk contains generic code info"""
    if isin(chunk, 'drg', 'hcpcs', 'cpt', 'cmg'):
        return 1
    return -1

If any of the words "drg", "hcpcs", etc. are in the chunk of text, that very likely means the price list has itemized HCPCS and CPT codes. So we mark that price list as compliant on that dimension. That's it! It's basically a poor man's machine learning.

As an exercise, write your own functions to try to improve the accuracy of the compliance score total.

This simple analysis reproduces more or less the results from Turquoise Health’s in-depth report and PatientAdvocate's Price Transparency Report.

Want to get involved?

If you want to participate in tracking down these hospital URLs, help us out by joining the data bounty. You can learn more information on our Discord, which is where most of our discussion takes place.

For feedback on this article or for help using the Payless Health price transparency database, please reach out to the article authors Alec (alec@dolthub.com) or Jaan (jaan@onefact.org.)

Mistakes: an earlier version of this article confused when the price transparency executive order was written (2019) with when it went into effect (Jan. 1, 2021).

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.