I analyzed 1835 hospital price lists so you didn't have to. But if you're a data scientist, you can do it too!
This article was written in part for python-centric data scientists and programmers, and comes with a follow-along notebook where we clean, analyze, and chart the real-world data that this article is based on.
The notebook uses python, polars, and altair to do the cleanup, analysis, and charting. I used a number of cool tricks and techniques to do the data wrangling and look forward to your feedback!
Surgical imprecision - hospitals hate price transparency
In 1990 a baby was born in Northside Hospital, in Atlanta. That was me. That was I. That was the author of this article.
I haven't thought about much Northside since then. Not until today, that is, when I discovered that they were the first hospital to get slapped with a $900k fine from the Center of Medicare and Medicaid Services. Northside just wasn't interested in complying with the CMS's price transparency law, which requires all hospitals to post their prices online in a single, machine-readable file.
As an alternative to posting their price list, Northside offered patients a phone number to call for a tailored price estimate.
Despite this being a flagrant violation of the transparency law, they'll probably appeal the fine.
Strong medicine, or sugar pill?
Northside might be an egregious violator of the price transparency law, but they did bring me safely into existence, so I can't hold too much of a grudge.
Nor are they alone. Some have estimated that still only half of hospitals are posting their prices, which somewhat agrees with the 33% or so that we collected in one of our data bounties.
As one of the only organizations that publishes this data openly, we naturally want to see what you can find in it. We've published a few popular blogs already, which are in equal parts journalism and python tutorial. If you're interested in that, check them out.
But in writing these articles, a problem I repeatedly ran into was that after cleaning, a lot of the data seemed to just vanish. Why?
Time for an X-ray: what's actually inside these price lists?
It turns out that the price lists are incredibly unwieldy to use for a number of reasons that I mention later. I blame in equal parts both the hospitals and the lawmakers.
- The hospitals that comply often publish the least information they can get away with
- The CMS law appears to be written by non-technical people and leaves room for hospitals to do (1)
The CMS requires hospitals to post a cash price, insurer prices, and a generic (i.e. hospital-agnostic) code for each procedure when applicable. Of the ~33% of hospitals that we found to have published price lists:
- 14% don't have a single generic code in their entire price list
- 21% of hospitals don't have a single price besides the MSRP, making them useless, since no one actually pays that price (not even the uninsured)
- 31% don't have a single cash price
Scoring the chargemasters
I came up with some criteria for separating the excellent price lists, such as
from ones like
which contain almost no usable information (only "MSRP" prices, low-or-no generic coding, etc.)
I came up with a simple rubric that calculates the total amount of information and junk in each price list, and graded them on a scale of 1-10. I checked for:
- how many rows had CPT codes
- how many rows had prices that were astronomically high or zero (in other words, junk)
- whether the price lists were adequately filled out (small penalty for having fewer than 5000 prices)
- whether inpatient/outpatient was indicated on the price
- how many different payers the hospital specified prices for
45% of hospitals -- those that publish their price lists, have scores 6/10 or less, which indicate some kind of important flaw in the data published.
For the exact methodology, check out our notebook where I use some cool tricks in the Polars library to wrangle this 300M-row dataset.
Benefit of the doubt, or doubt the benefit?
It's easy to pooh-pooh imperfect laws, and to be fair, this one is better than nothing at all. But the way the current CMS law is set up still makes comparing prices harder than it needs to be. Put yourself in the position of a fictional patient:
It's been years of ice hockey, tennis, and stumbling home from Finnegan's Pub. Now, with your X-ray luridly illuminated by a fluorescent panel, your doctor is circling a shadowy area just beneath your femur with the back of his pen. Bone on bone, he tells you. It's time for a knee replacement. He'll write you a referral.
So what's it going to cost you? You'll have to first find the price list on your hospital's website which, even if it exists, won't necessarily be easy.
Then you'll have to Google what code you're looking for. If it's a knee replacement, you'll find that it's CPT code 27486. Or maybe 27487. You're not really sure, and neither am I. Perhaps it depends on the surgeon. You have to hope your hospital uses CPT codes and not, say, DRG coding, or something else entirely, otherwise you'll need to look up those codes too.
From here, you'll have to hope that the hospital in question posts prices for your insurer. Since only 2/3 have any insurer prices at all, this isn't a guarantee. And then you'll have to actually find your insurer in the list, which may not be easy since some hospitals have up to 100 different prices for each plan: are you
AETNA OPEN ACCESS MANAGED PLUS NET or
AETNA AETNA HEALTHFUND AETNA CHOICE POS II NET? Or perhaps
AETNA HSA OPEN ACCESS POS II NET? I have no idea and neither do you probably.
Once you've figured that out, you've ballparked the price of your knee replacement from one hospital. Then you repeat for the other hospitals in your insurer's network.
This is the current best-case scenario. Most probably won't ever get to the chargemaster.
Dead On Arrival: What the CMS law got wrong
If we're going to do this at all, we should do it right. It takes minimal effort to make the following changes:
- Centralize all the files. The hospitals post their price lists on both their webpages and on the CMS website on a single page. This makes it easy to compare hospitals and see which ones have complied with the law.
- Require a standard format. Pick JSON or CSV in some standard encoding. We don't need hospitals publishing XML files and Excel spreadsheets.
- Do away with MIN/MAX price-reporting requirements. The CMS requires hospitals to publish a minimum and maximum price for each procedure. This doesn't add to the value of a machine-readable file, since calculating the min/max per payer is as simple as doing a groupby/aggregation. This, to me, was a tell that the lawmakers who made this were not thinking carefully about the details of the requirements.
- Force hospitals to submit price lists conforming to a reasonable, universal schema. Hospitals are willing to spend $1M just to avoid publishing these things. They have the cash to tidy up their CSVs. A reasonable, non-ambiguous data schema would include:
unique procedure ID,
hospital billing code,
- Force hospitals to a standard template for naming insurers/payers. It's impossible to compare hospital prices when each one uses its own cryptic names for insurers and their plans.
These are just a few of the changes that would make the machine-readable files more valuable to researchers like me.
While we wait for the CMS fines to increase (again) and for more hospitals to start posting their price lists, we still think that there's plenty to be discovered in our database, which is the largest of its kind. Click it, check it out, even fork it (you can do that with Dolt) and let us know what you think.
As usual, a special thanks to those of you who helped us build our $15,000 hospital price database. Creating distributed databases is our thing and we run contests like this all the time, literally. If you're interested in joining our latest bounty or just talking to us, you can hop on Discord to learn more.