This article is equal parts data repository, code repository, and notebook. The linked GitHub repo gives you the tools you need to reproduce this analysis or create your own.
Last month I wrote about how insurance companies had published a trillion negotiated rates, prices that reflect how much insurance companies pay hospitals, clinics, surgeons and physicians for their services.
The data, republished monthly, is spread out over hundreds of thousands of files (MRFs) on different websites, and weighs in around a 500TB. That's more than the Library of Congress, the entire LibGen catalog, and the Netflix 1080p catalog, combined.
Image from previous blog "A trillion prices"
Getting the data into one place is its own major engineering challenge which is why access to the collated data is so expensive. At least one broker is asking $400,000 a year to query it. On top of that, once you get the data, you have to figure out whether the services are even offered by the providers and, if so, whether the prices given are right.
I want to show one way that you can use this data as it is, for free, using some simple tools, provided that you understand what's in it. Maybe I can save at least one of you half a million dollars.
In fact, the story of trying to figure out what this data means is interleaved with my first crack at using it.
For example, while trying to get the true price of a C-section, I learned that almost all these negotiated prices never actually get paid.
Fictitious prices inflate the data by 10-100x
If you need a C-section, do you plan on visiting your neurologist, opthalmologist, or physical therapist? Because they might have a rate for one.
It started with me calling Columbia's Neurological Institute of New York to figure out why they had one of our samples' most expensive C-section. I pointed out that their provider number was linked to a rate of $17,100 for the procedure, almost 10 times the median rate of $1,766. But they were just as confused as me.
The same thing happened for the other providers that I called. It seemed no one was offering these procedures at these prices, so who was the rate for?
As it turns out, insurers pool hospitals, clinics, surgeons and physicians into large groups that share a contract, and everyone shares prices in that group. For example, there might be a group with just one obstetrician who handles C-sections, but every other provider in that group will have rate in their contract.
And each of these rates gets added to the MRFs, inflating the size, and making figuring out who is actually getting paid what that much more difficult.
Open question: how do we link providers to the service they actually provide? If we can do that, we can dramatically shrink the size of this database.
The true price of a C-section? Some answers, but more questions
Note: anyone who tries to handle these files runs into the same problem: loading JSON files that routinely exceed 100GB. We now have a tool that will efficiently parse and filter those files by billing code and NPI number. It can be run on any machine, can be used to "one-shot" collect a set of negotiated rates, and will even put them neatly into linked tables. The linked GitHub repo will allow you to reproduce this analysis in an hour or so.
But even if we can't do a perfect job, we can do an acceptable job by shrinking the search space to just those providers that we're interested in. One way to do that was suggested to me by Jaan Altosaar at Payless.health. The CMS has complete list of NPI records including whether an entity is an organization, and what kind of organization they are (their taxonomy.)
When we restrict ourselves to just OB/GYNs (organizations) and hospitals, we can get a cross-section of the market, and what United HealthCare actually pays for a C-section.
Rates for C-sections from over 10,000 different providers: OB/GYN organizations and hospitals only. Mean price ~$1,766. For clarity on the chart, excludes prices > $7,000. CPT Code 59510
If you want to do your own analysis, the database to reproduce this chart can be found on DoltHub. and the notebook to make this chart can be found here.
Even with extreme outliers ignored, prices vary from as low as $500 to as high as $5,000, a ten-fold difference. It's not entirely clear where the price variation comes from, since this billing code should include most pre- and post-partum care. Can prices really vary that much?
Open question: why do prices vary as much as they do?
Checking the data quality: are these rates real?
As far as I know, no one has taken the step of confirming these prices with the providers themselves. Taking nothing for granted, I thought I'd try.
Astonishingly, no providers I checked (including hospitals) had the price for a simple C-section online (CPT 59510), despite the recent change in price transparency laws that requires them to post those negotiated rates on their website. So I started making phone calls.
Most of my tries ended up flubbing, with countless redirects and dead-ends. One provider didn't have obstetricians in their group. Generations of Women, in Redding, CA, had one, put me in a circular transfer loop when asking about the price. Temple Health's's pricing help line even flat-out refused to give me any prices at all, citing accuracy concerns.
For the handful of providers I was able to reach, here are the insurance prices they quoted me
||Negotiated rate (median: $1,766 )
||Phone-quoted insurance price
||Phone-quoted self-pay price
|Piedmont Healthcare for Women
|St. Francis Hospital
|Every Woman's OB/GYN
|Life's Journey OB/GYN
* Unclear if this contains other charges
***redirected to Vanderbilt Medical Group
This leaves me with even more open questions:
- How come the negotiated rates are often so different from the quoted rates? Do the quoted rates include or exclude certain add-ons, or vary by physician?
- How come some hospitals (like Novant Health) have rates for procedures they don't technically provide? (CPT 59510)
So, are there quality issues with this data? Is there something I don't understand? If you have any information, write me at firstname.lastname@example.org.
The future of this data is bright...
Policymakers, watchdogs, and journalists will want to leverage this data to understand why prices vary like they do. Insurance companies and hospitals want to know who's getting a bad deal.
The prevailing opinion is that over time, both of these effects will not only drive prices down, but also closer together, as insurers and hospitals converge on the "true price" of services.
...but we need open access. You can help
But without open access, this process of convergence won't happen. That's why we're meeting regularly with the teams at Charity Engine and Postman to bring this insurance data to the world in a sustainable and affordable way.
If you're looking for a cause to support, it's one of the few places you can put your money to drive healthcare costs towards something sane in America.
On the other hand, if you're looking to contribute your time, you can do that too. If you found errors in this analysis, have questions, or just want to learn more, you can hop on our Discord chat or drop me a line at email@example.com.