The Big Data Blog, Part I: Subha Madhavan
The term “Big Data” is a hot phrase in a variety of sectors, from the biomedical and environmental sciences to team sports. But what is “Big Data” and how is it different from other kinds of data? How is Big Data being used, and what challenges does it bring? To answer these questions, we have spoken to scientists, scholars and big data gurus. We will share their responses to these questions through a series of blog posts leading up to the AAAS Center for Science, Technology, and Security Policy (CSTSP) and Federal Bureau of Investigation’s public event on April 1, “Big Data, Life Sciences, and National Security.”
Today’s expert, Subha Madhavan, Director of the Innovation Center for Biomedical Informatics at the Georgetown University Medical Center, discusses current and future applications of Big Data within the biomedical field, and the potential challenges Big Data might confer to biomedical research and patient care.
CSTSP: What does “Big Data” mean in the biomedical field?
Subha: Big data in biomedicine is a nebulous term to describe the challenge of making sense of potentially limitless amounts of biomedical data to derive concrete meaning and in turn applying this information toward improving human life. Big data in the biomedical field is driven by the single premise of achieving precision medicine that will significantly improve patient care.
Advances in the integration of multiple omics data, which are the primary big data in biomedicine, also known as ‘panomic analysis’ of patient data, is enabling discovery of the causal genetic factors that contribute to precision medicine - the right target, the right drug and the right patient.
The world’s current sequencing capacity is estimated to be 13 quadrillion DNA bases a year. The cost to produce an accurate human whole-genome sequence has dropped to $1000 towards the end of 2013, and is expected to further fall to under $100 in the next decade; the capacity to sequence the genomes of a billion people will have been realized in the next 20 years.
CSTSP: How is “Big Data” different from “lots of data” or “metadata”?
Subha: It differs from ‘lots of data’ or ‘metadata’ in that big data is describing a new generation of technologies and architectures to extract big impact through analysis. Metadata is used to further stratify data, irrespective of the dataset being large or small. Also, big data in biomedicine is more heterogeneous than information in any other field. This is one of the many definitions. The key here is the realization that we need to get more value out of the data that is focusing the big data practitioners. It is not specific to biomedical sciences.
CSTSP: What, in your opinion, are the top 3 most innovative applications of Big Data within the biomedical field?’
Subha: Big themes for innovation in big data in the past 24 months have come from three major areas – 1) Personal data collection/mobile health, 2) Patient-centered care, and 3) Cloud computing.
Specifically, the 1000 Genomes Project provided the first publicly available big genomic data that has enabled developers to create new architectures for transferring, storing, and analyzing big data. Big corporations such as Amazon, GE Healthcare, IBM, Microsoft, Google and offer solutions to store, analyze and deal with complex biomedical information.
CSTSP: In what ways do you see Big Data within your field changing or growing in the near future?
Subha: Rare disease research can benefit from big data by combining deep information that is becoming increasingly available on these patients. Big data can help address the clinical utility of biomarkers by predicting patient response and outcomes. Big data is an essential component of a “learning healthcare system” to address rising healthcare costs through comparative effectiveness and cost effectiveness research from large patient datasets such as from claims databases.
CSTSP: What is deep information? Is that more than polymorphism to exact DNA sequences?
Subha: Deep information may include transcriptomics and metabolomics. It may also include miRNA or other small RNA regulators of gene targets. There is additional evidence emerging now of cellular context, for example the types of cells that express a particular gene or protein, sub cellular localization etc. Deep information may also include pathway or network level perturbation that may be relevant to rare disease etiology, drug response or target discovery.
CSTSP: What are some of the biggest challenges or risks facing biomedical and biological sciences in the near future, and what role might Big Data play in addressing or exacerbating these future risks and challenges?
Subha: Patient privacy is a big challenge for medical big data, but scientists across the globe increasingly want to collaborate and instantly share data to enable more rapid advances in biomedicine. New architectures will be designed to securely collect, transfer, store, and analyze big data. Companies such as Apple, twitter, Facebook, Microsoft, and Amazon are experts in handling big data. The scientific and medical fields will adopt or adapt similar scalable infrastructure for big data sharing and analysis in the future.