Would you pause before putting the chocolate or the chips in your grocery cart? Before forking over your credit card for a pack of cigarettes? What if your doctor could tell how often your gym membership card was swiped for entry? Would you go more often?
And whether you behaved differently or not, how would you feel about your doctor knowing about your consumer habits?
To judge from the recent response to the University of Pittsburgh Medical Center‘s revelation that they have developed prediction models combining census and medical data, you might feel violated by your health providers having insight into anything that falls outside the realm of the purely medical. Take this quote from an article about the data analysis UPMC conducted:
The strategy “is very paternalistic toward individuals, inclined to see human beings as simply the sum of data points about them,” Irina Raicu, director of the Internet ethics program at the Markkula Center for Applied Ethics at Santa Clara University, said in a telephone interview.
The New York Times offers a slightly less alarmist perspective on the UPMC work, pointing out both that much of the data used in these predictive models is already either public or easily obtainable, and that the Accountable Care Act has increased the impetus for providers to reduce patient care costs. Sophisticated predictive analytics offer one mechanism to do this.
So how do these models work? At a high and simplified level, here’s how:
- The statistician receives two or more data sets (for example, a health insurance claims file and a consumer marketing file). The data needs to include enough information to link the same individuals’ data with itself across the multiple files. The statistician will use unique identifiers, usually things like name, zip code, or birthdate, to get each person’s data from the multiple files into one row.
- An outcome is identified. What is it the health plan or statistician wants to achieve? UPMC wants to figure out who is most likely to utilize urgent care, which means there is a variable in the combined data file that corresponds to urgent care use. It might be a yes/no variable, or it might be number of times urgent care was used over a particular period. Other outcomes that health providers care about include medication adherence or screenings and immunizations.
- Now, the statistician can look at who historically achieved the desired outcome or not. A statistical equation can describe the type of people more likely to use emergency services in the past, and the people less likely to do so. This is a probabilistic model, which means it is a good educated guess to help predict which people might behave a certain way. There’s no guarantee that someone who looks like the ER Frequent Flier profile will actually use the urgent care service, but it’s a decent bet.
- Now the health plan can take that statistical equation and either apply it retrospectively to data they already have, or collect new data on patients. This data will allow the health plan to flag people who are most likely to use the urgent care room (or stop taking medication, or skip their pap smear, or whatever).
- Finally, the health plan will provide some kind of outreach or intervention to the folks who have been flagged.
The advantages of this sort of approach are largely economic. If a health plan or provider knows which members or patients are at highest risk for a negative behavior, they can focus their budgets and times on reaching out to those people instead of to everyone. On the flip side, they can go into a kind of maintenance mode with the lowest risk people, who might not need extra reminders or case management to appropriately utilize services. Theoretically, these predictive models should also improve outcomes on a population level by reducing inappropriate medical utilization, increasing adherence, etc. Does that actually happen? Sometimes, yes, for example with a project Kaiser Permanente did to improve cardiovascular outcomes that reportedly saved $1 billion.
Having worked with some of these models myself, I think the recent press coverage is overly alarmist about the potential privacy violations of this type of predictive data analysis. I posed some provocative questions at the beginning of my post about whether you’d change your consumer behaviors if you knew your doctor was aware of them. Well, chances are, your doctor will never actually know what you are putting in your grocery cart. Here’s why not:
- Doctors are incredibly busy. And a problem with any sort of rich data set is that it is labor- and time-intensive for a human being to crunch all those variables into meaningful output. Your doctor is probably not going to have either the time or the inclination to pore over your purchases. Which doesn’t really matter, because . . .
- These predictive analytics are done at an aggregate level. Even though people are linked through multiple data sets, the algorithms produced do not refer to any one individual. A subset of each person’s data gets run back through the algorithm and that produces some kind of score or risk level, but it does not produce a laundry list of the consumer data that went into the analysis.
I am not saying that it’s wrong to feel concerned about how your personal data is being used. In fact, I urge you to protect your data as much as you can to your comfort level (here’s a nice roundup on Reddit of how to scrub your data from various Internet sources). However, the particular spin the media has put on this use of consumer data to make health behavior predictions overstates the immediate threat to individual privacy.
So will your doctor be chiding you about that case of beer you’re picking up for the 4th of July barbecue this weekend? Probably not. But depending on what your data reveals, there may be extra reminder letters in the mail from your health plan when it comes time for your next exam.