In the previous interviews with Subha Madhavan and Daniela Witten, the AAAS Center for Science, Technology, and Security Policy (CSTSP) discussed the role of Big Data in the biomedical sciences and the statistical challenges facing Big Data. Big Data is also used in a variety of fields outside the biomedical field. One of the newest fields in which Big Data can have an impact is environmental science. With issues such as climate change and environmental security, Big Data may have a significant role to play in addressing national security risks related to the environment.
CSTSP spoke with Angel Hsu, Director of the Environmental Performance Index, a joint project between the Yale Center for Environmental Law & Policy and the Center for International Earth Science Information Network at Columbia University. This is a part of a series of interviews that leads up to the AAAS/CSTSP and Federal Bureau of Investigation’s public event on April 1, “Big Data, Life Sciences, and National Security.”
CSTSP: What does “Big Data” mean in the environmental field? How is that different from “lots of data”?
Angel: This is a great question and I think 'big data' is still very much being defined in the environmental field. In many cases, there does not even exist enough baseline "data," let alone "lots of data," by which to assess environmental impacts. For example, sustainable agriculture: What does it mean - how do we define it, and how do we assess whether practices are sustainable? Can there be optimal uses of fertilizers or pesticides, or is any use inherently deleterious for 'sustainability'? The research in this area is new and evolving, and very little comparable data exists by which to understand, for instance, what is even meant by 'organic.'
My argument is that we are still in the very early phases of developing large-scale datasets for the environment. There are some sources of big data generation for the environment. Satellites, for example, have been around for nearly half a century and are being used for all types of observation, ranging from assessing urban growth to sea level rise and deforestation. However, even these sources of 'big data' are limited in some way in their ability to answer certain environmental questions. Satellite analysis can tell, for example, how much a city has grown over a certain time period, but it doesn't tell me how much energy or water was used in that expansion.
CSTSP: What, in your opinion, are the top 3 most innovative applications of Big Data within the environmental field?
Angel: Global Forest Watch, a collaboration convened by the World Resources Institute that allows for near-real time monitoring of deforestation. One of the strengths of this data-information platform is the collation of many geospatial and satellite datasets that give a granular and aggregate picture of the world's forests. The website includes deforestation alerts called FORMA for the humid tropics based on rapid assessment of satellite images to predict where deforestation might be happening on a monthly basis. The FORMA alerts are a great example of big data being applied to help inform the public and decision makers.
Citizen Science. I know this isn't 'one project' but I think that citizen science, user-contributed data, and crowdsourcing have the potential to push forth the big data revolution for the environment. Because of the data gaps I mentioned above, our understanding of the environment, impacts, and change is limited. There are a few examples of citizen science efforts to involve people in the collection of environmental science data. The e-bird initiative is a great example of this. People love bird-watching and taking pictures of wildlife, so this effort has combined the two to better track a whole host of questions: biodiversity, migration patterns, and species distribution. Another effort out of the University of Oxford is asking citizen scientists to donate their computers and spare time to help run thousands of climate models to establish a linkage between extreme weather events and climate change. In this way, they are generating a big dataset to answer a question and involving citizens to do so. It's really interesting.
Another interesting effort to link big data to the environment is the Wildlife Picture Index, a collaboration between Conservation International and HP, which is using large datasets from camera traps and climate data to develop an early warning system to understand threats to tropical mammal and bird diversity.
CSTSP: You mention Citizen Science. Are there any particular risks and challenges that citizen science or large collaborative networks present to the security of the data?
Angel: yes, there's a risk that user-contributed data may be misused or personal information disclosed. For example, we've seen that happen in 2012 incident involving Target where a teenaged girl's parents found out she was pregnant before she had a chance to tell them. Certainly, there needs to be a set of guidelines, a protocol to establish security measures and ethical standards by which these kinds of data may be collected and used.
CSTSP: In what ways do you see Big Data within your field changing or growing in the near future?
Angel: I think that environmental policymakers and decision makers need to get more creative in thinking about how to generate big data for the environment. Trees can't tweet, and oceans can't create a census of species that live beneath their surface, so until then we've got to get creative about how we can generate the necessary information and knowledge to make the smart policies and decisions that aren't ad hoc. Even though we talk a lot about the need to bridge together disciplines and actors, it's still not being done well enough in environmental policy.
The fact is that businesses and the private sector move faster than academia and can collect data often more efficiently and easily than scientists can. For example, Coca-Cola has a better global water dataset than any UN agency. It's because they operate in 200 countries around the world and their business is directly affected by water risk. So they need to know how much water is available, where it's coming from, what its quality is, and what stakeholders might be impacted by their business operations. The good news is that they are also generous with the data. They gave it to the Aqueduct project that makes all of this data available so other businesses can benefit as well. We need more of these types of public-private-academic partnerships if we hope to improve big data for the environment.
CSTSP: What are some of the biggest challenges or risks facing environmental sciences in the near future, and what role might Big Data play in addressing or exacerbating these future risks and challenges?
Angel: I'd argue that climate change is and will be even more so one of the biggest "game-changers" with respect to our understanding of the environment. There are still too many unknowns. For example, studies have shown that China's air pollution over the last decade has had a cooling effect on the earth's climate. So what happens if China's $277 billion dollar plan to tackle air pollution, approved last July, is effective in bringing down pollution levels to those of Switzerland's? Does this mean there will be a kickback effect whereby global temperatures rise even more than what scientists currently predict with current climate models?
CSTSP: In what ways do you see Big Data addressing the needs of climate research that are currently in development or still need to be developed?
Angel: The Intergovernmental Panel on Climate Change 5th Assessment Report 5th Assessment Report was released on March 31. A lot of the scenarios predicting emissions rise and the impacts on sea levels, global temperatures, etc. rely on large-scale, coarse resolution datasets and models. I would say that the challenge is not that there isn't big data by which to model future climate change and its impacts, but how to downscale these large datasets on a local scale so that decision makers and individuals have a sense of how climate change might affect them. I think this is where there's a big data and knowledge gap. The models are global, so it's not clear how cities, states or even countries might be individually impacted.