Comparing distributions of hospital prices

Nathan Sutton
3 min readOct 18, 2021

Kolmogorov and Smirnov to the rescue.

I just received an unfortunate out-of-network bill for the physician services portion of an emergency room visit (in-network facility), and so hospital prices are on my mind again. I’ll spend a moment exploring how an open database of hospital prices can help answer the following question.

Which hospital system in North Carolina has the least costly distribution of prices relative to other hospitals?

These data begin with obscure chargemaster files hidden on hospital websites, but I went through the exercise to migrate them into a handy dockerized database. In this world, they are represented as a normalized price table.

In the first iteration of these files, one consistent type of charge reported by hospitals is gross prices. These aren’t actually paid by anyone but can be thought of as the starting point a hospital takes in its price negotiations with insurers. Here is a small sample of the table. The column hospital_id links to the dimension table for each hospital, and concept_id links to the appropriate CPT or HCPCS code in the Athena standard vocabulary.

These prices can then be pivoted into distinct columns for each hospital system. Every row in this view represents a billing concept. Every column in this view represents a hospital group. For example, Transylvania Regional Hospital is part of the Mission Healthcare system (now owned by HCA). Note that there can be many prices for one billing concept in this table. In particular, one distinct price for each hospital in a system.

Using this matrix we can compare the hospital prices of one system to the rest of the systems using their empirical cumulative distribution functions. In each plot below a hospital system (red) is compared to the rest of the combined systems (grey). Higher values mean that a system has relatively lower gross prices than others. The y-axis represents the quantile of the empirical cumulative distribution. Given the incredible skew in prices, I chose to log10 transform these prices (x-axis).

One non-parametric test to evaluate of two distributions are equivalent is the Kolmogorov-Smirnov test. This returns a test statistic D that is larger when the two distributions diverge. Most of these differences are small, with only First Health not differing significantly from the rest of the hospital systems.

Lucikly for me, my favorite hospital Pardee UNC has substantially lower gross charges (D=0.26) for all billing codes in their cumulative distribution.

Caveats — This is meant to be an illustrative and light-hearted look at hospital pricing data. It does not account for contracted rates or self-pay discounts at any hospital system. In no way use this to determine where you should seek care.

--

--