Talk about healthcare “big data” seems to be everywhere. It is a discussion generously leavened by the promise of the future and I am hopeful—even if only a quarter of it comes true—that it will herald a revolution in the delivery of effective and efficient clinical care. In the meantime, here is a hard-won personal primer for those of us who still have to work on planet Earth.
Lesson 1: Healthcare data are getting cheaper to acquire but more expensive to use.
With the digitization of so much of the healthcare economy in the last 10 years, data are now plentiful and relatively cheap to acquire. However, they grow very fast and require continuously more server and computational capacity to store and manage. Cloud-based hosting helps, but dialing up servers to run reports can get expensive quickly, especially if big data interrogation isn’t your core business.
Lesson 2: Big data aren’t always better.
A fair amount of the clinical findings from healthcare big data confirm what earlier, smaller studies and trials have already shown. That’s why researchers do power calculations. They know that once you get to the right sample size, 15 million records do not necessarily provide more insight than 1,500. If you make the mistake of asking the same question over and over again to ever larger data sets, in all likelihood a lot of time and money will be spent getting the same answer. The real trick to exploiting big data successfully is assessing whether size contributes to the underlying heterogeneity and potential explanatory power of the data set.
Lesson 3: Source data quality can have big analytic consequences.
It’s worth remembering that almost by definition big data are a byproduct of some other transaction and will be used for a purpose for which they were not designed. At best, big data doesn’t represent the truth, but one internally consistent version of the truth. The problem is that end users are often oblivious to how the source data was captured, cleaned, structured and normalized. Mistakes, errors, or even judgment calls of database administrators and developers anywhere along that chain can have a big impact on analytic outputs.
Even more concerning in healthcare big data analytics is the underlying volatility of the source systems themselves. Since data requirements are almost always driven by their primary application—for example documentation and billing in EHRs—few system operators give any thought to how an “upgrade” may affect secondary applications. Given that the Meaningful Use era in healthcare informatics is essentially defined by underlying source system change, big data users need to be mindful that even the internal consistency of data sets may be in routine jeopardy.
Lesson 4: Most big data analyses are surprisingly simplistic.
Many vendors will profess their big data acumen by referencing a long string of impressive-sounding statistical models. They’ll probably also speak about “world class” proprietary systems for managing and rendering data.
Don’t be fooled. Most big data analyses used in the real world are no more complex than what can be done in Excel pivot tables. The issue is that X population may be 200 million patients while parameters Y and Z are stored in a database whose schema looks nothing less complex than the human genome itself. It takes a fair amount of skill to get those answers quickly and consistently. Even better vendors will help define an analysis plan to answer questions that don’t seem possible from the data set at first glance.
Lesson 5: The delta between information, insight and intervention remains very large.
While there are some notable exceptions, healthcare big data analytics now allow us to do with lots of data what we used to do with a little data. The size of a data set usually doesn’t make it any easier to determine causality or to figure out what’s going wrong and how to solve it. I was once told that big data just allows good managers to do what they would have done anyway, but with more confidence.
The hard truth is that the size of the data set doesn’t seem to make answers any easier to come by. Big data are a tool; it’s the human interpreter who still has to do the really hard work.
Lesson 6: The greater the scope of your data, the more discipline needed to make sense of it.
One of the great temptations of healthcare big data is that they seem to present almost limitless possibilities for study and analysis. And in many ways, they do. What often happens, however, is that big ideas are never reconciled with the time and intellectual resources necessary to bring to constructive fruition.
I would argue that the bigger and more powerful the data asset, the more discipline needed to make it work. And that discipline is often composed of hundreds, if not thousands of discrete analyses that consume both time and money. The payoff can be potentially enormous if you can retain sufficient focus to actually get there.
Lesson 7: Studying healthcare sociology may be as rich an application for big data as studying pathophysiology.
It is often argued that the promise of big data will be most manifest when we can use it to understand the comparative efficacy of different treatment modalities in the real world. The underlying assumption is that the efficacy differential will be achieved through causal relationship between treatment, physiological response and outcome.
But rarely are real world big data sets so pristine that we can have confidence in accounting for all of the potential biological confounders. That’s why randomized, controlled trials will continue to play an essential role in vetting new therapies.
However, I would argue that what healthcare big data lacks in understanding biochemical reactions it makes up for in its ability to describe human interaction in the real world. Understanding the effect of physician, patient and payer behavior on outcomes is an essential factor in improving U.S. healthcare.
Lesson 8: If predictive analytics in healthcare were as good as they are in Netflix … we’d all be dead.
I am intrigued and occasionally even inspired by how major consumer corporations like Google, Amazon and Netflix use big data to solve problems. They only need to get it right maybe 10 to 20 percent of the time to have a huge impact on their bottom line. Can you imagine if predictive healthcare analytics only got it right 20 percent of the time? The data needed to quantify disease far exceeds what is required to characterize movies.
Lesson 9: The future value of big data really lies in personalization, if we’re ever allowed to do it.
Ultimately, I think the value of big data will be tied to our ability to personalize the outputs. Moving analytics beyond outcomes researchers, health system managers, and industry marketers and getting the outputs into the hands of individual doctors and patients is where we need to go. Unfortunately, the Health Insurance Portability and Accountability Act, the common rule, and most Office for Human Research Protections regulations were written in the pre-digital age.
I am not questioning the paramount importance of privacy and trust in the conduct of medicine; I am arguing that for American healthcare to realize the full potential of big data we absolutely need to have the law, regulation and technology working in the complete interests of the patient. After all, if Amazon is allowed to know everything about me, I would think it would be nice if my doctor could, too.
Brendan Mullen is senior director of the American College of Cardiology’s PINNACLE Registry.