Wide interest in Big Data is a relatively recent phenomenon. Google Trends shows that search interest in the term started growing sharply in December 2011. ” To engage the public in a dialogue on this rapidly growing field, the AAAS Center for Science, Technology and Security Policy and the Federal Bureau of Investigation held a public event on April 1 on “Big Data, Life Sciences, and National Security.” CSTSP also conducted interviews with Subha Madhavan, Daniela Witten, and Angel Hsu.
There’s a lot of data going around, and it’s been around for a long time. The Human Genome Project, for example, has generated a huge amount of data long before people were using the term “Big Data.” Researchers at the National Institutes of Health have thousands of hard drives, CDs, Zip drives, and floppy disks filled with data that was collected ever since data could be stored. The same is true for academic research labs, health care providers, environmental groups and other government agencies. The Census Bureau alone has huge amounts of data. So if large amounts of data have been accumulated for many years already, what is the big deal about “Big Data” now?
One way to start thinking about this is by recognizing that “Big Data” and “lots of data” are not the same thing. Lots of data can sit around for many years in stored drives or databases. It can be analyzed, and a lot of good information can come from that analysis. That’s how data analysis worked in the past. But that vast amount of data is still not “Big Data.” So what makes data “Big”?
The answer to both “why now?” and “what does ‘Big’ mean?” has three sides. First, improvements in hardware have transformed how fast we can process and analyze the data today compared to even a few years ago. Second, not only can we analyze data faster, but we can also generate new data today at a rate that was not possible before. Third, we can start thinking about analyzing different types of data from different sources and different databases in tandem to reveal new insights. So technology has advanced to the point where it’s now possible not only to analyze the huge amounts of data that are being generated in new ways, but to also go back and look at the large of amount of data that has been generated in the past and look at it through a different lens.
Scientists, statisticians, and engineers are now developing the tools necessary to look at data differently. We already have some of these tools, while other tools are still being refined. All of them are, however, designed with the goal of transforming the way we look at data. So the “big” in Big Data is not just about the size of the data, but it’s also the fact that we’re starting to look at it in new ways.
These new ways of generating and analyzing data provide us with tremendous opportunities. But they also present new risks and challenges. “Big Data” is still very much in its infancy, and that’s why we’re talking about it now. We know we CAN analyze data in new ways, but we’re still trying to figure out exactly how to do it efficiently, proficiently, safely, and securely. There are still a lot of questions to answer, and a lot of new problems to solve. And the more we talk about it now, the fewer problems we may encounter in the future.