Interest in Big Data has surged over the past two years, as have the opportunities and challenges that Big Data and its various applications present. However, an in-depth assessment of how these activities impact the life sciences and what potential security challenges might exist has not occurred. To promote a discussion of these issues, risks, and the broader implications of Big Data in national and international security with respect to the life sciences, a community of scientists, academics, security experts, industry representatives, and decision-makers gathered at the Renaissance Washington, DC Downtown Hotel for a joint event hosted by the AAAS Center for Science, Technology & Security Policy and the Federal Bureau of Investigation Weapons of Mass Destruction Directorate on “Big Data, Life Sciences, and National Security.”
Speakers, working group members, and attendees discussed a variety of topics related to Big Data in the biomedical, clinical, and environmental sciences (Figure 1) and biological security (Figure 2). The objective of the public event was to engage stakeholders in thoughtful dialogue about the benefits of Big Data in the life sciences and to begin identifying associated risks or vulnerabilities to biological security. The event included panels in which speakers discussed the definition of Big Data and state of the science, described its current and future applications to addressing critical biological security issues, identified possible risks presented by Big Data in the life sciences, and explored the level of broader access to data repositories and analytic technologies.
AAAS CEO Alan Leshner opened the event by providing a framework for the day's discussion stating that “it's more and more urgent that we figure out the most effective ways, not only to analyze Big Data, but to share it and share it widely.” Dr. Leshner went on to challenge the attendees by framing the event as an “opportunity to come up with concrete steps or at least concrete next steps to deal with these issues, and the more concrete you can get in your conversation, the more productive the meeting will be.”
The event consisted of five panels, each focusing on a different aspect of Big Data in the life science and national security. These panels were:
- Big Data: Definition, Sources and Data Sharing
- Applications of Big Data and Analytics to International and National Biological Security
- Security Risks of Big Data: Privacy, Openness, Data Management
- Increasing Access to Big Data and Analytics: Implications for Biological Security?
- Implications of Big Data and Analytics to National and International Biological Security
Several major themes emerged from the event:
- Big Data in the life sciences is increasing at an exponential rate regarding volume of data, applications, and analysis tools.
- As the pace with which large volumes of data are collected, stored, and analyzed increases, so do risks of unintentional or intentional errors that could potentially compromise data quality and reliability (i.e. spamming, spoofing).
- Data is being generated at a much faster rate than it can be analyzed or monitored. Consequently, particular attention should be paid to the ownership, openness, disposition and accessibility of the data.
- As the volume of data continues to increase, the reliability of the data and its associated sources becomes a significant factor.
- Reproducibility of data and analytic methods are extremely important for ensuring high confidence in the data and the results of data analysis and assessments. This could be challenging if the processes are proprietary in nature.
- Automated data processing (e.g., data visualization techniques and machine learning) and human analysis are important for producing useful, actionable information from big data.
- As data collection, storage, and analysis technologies advance, potential vulnerabilities might be introduced into the system increasing the national and international security risk posed by Big Data in the life sciences.
A key emphasis of the discussions was on the implications of Big Data and analytics for national and international biological security:
- Big Data in the life sciences is currently being developed for biosurveillance and precision medicine, but it could also be applied to synthetic DNA screening, forensic analysis, intelligence gathering, and smart vaccine and drug development.
- The level of uncertainty inherent in Big Data elicited a number of comments on the importance of effective and early communication about the analytic products and its applications to national and international biological security problems with the general public, policymakers, public health workers, law enforcement, intelligence officers, and the research and health community.
- Protection of databases and analytic tools from deliberate misuse is a significant risk associated with Big Data.
- Privacy of data and databases and how aggregation of disparate and discrete data sets may effectively deanonymize information sources are important risks associated with clinical, genomic, health care, or other human data.
A balanced approach is necessary to ensure that the data are safe, secure and reliable without discouraging innovation and advancement in Big Data tools and analytics, which operates in a relatively open environment:
- Possible approaches for minimizing the risks of Big Data in the life sciences include improving or developing relevant oversight frameworks (e.g., refining the Institutional Review Board function and process), mechanisms for professional responsibility, national and regional laws and policies, and international legal instruments. Accountability of Big Data, analysis, and application will be important for each of these approaches.
- An approach that addresses cyber threats unique to Big Data across collection, storage, analysis, and applications. Examples of cyber threats unique to Big Data in the life sciences:
- Intellectual property/proprietary information protection challenges
- Implications for enforcing deemed export/export licenses
- Genomic identity theft, misidentification, discrimination, exploitation, violation, or assumption
- “Hacking”, “spamming”, “spoofing” data sets to alter and/or modify analytic outcomes
An important overall theme of the discussions was that Big Data in the life sciences involves: 1) a system of collection and storage tools, analytic methods, human input, and end-use; and 2) a diverse community of stakeholders including scientists, analysts, and users across a wide variety of disciplines and sectors. Promoting communication and cooperation between these distinct stakeholders and across the system to safeguard Big Data in the life sciences will be critical to preventing, detecting, and ultimately responding to intentional threats or vulnerabilities introduced into the system.
In the coming months, AAAS CSTSP, FBI and its working group members will produce reports based on key issues identified during the April 1 event.