When you sit down to talk with an astronomer, you might expect to learn about galaxies, gravity, quasars or spectroscopy. George Djorgovski could certainly talk about all those topics.
But Djorgovski, a professor of astronomy at the California Institute of Technology, would prefer to talk about data.
The AAAS Fellow has spent more than three decades watching scientists struggle to find needles in massive digital haystacks. Now, he is director of the Center for Data-Driven Discovery at Caltech, where staff scientists are developing advanced data analysis techniques and applying them to fields as disparate as plant biology, disaster response, genetics and neurobiology.
The descriptions of the projects at the center are filled with esoteric phrases like "hyper-dimensional data spaces\ and "datascape geometry."
In one project, scientists are applying analysis techniques developed for astronomy to data from neurobiology. The techniques were developed to analyze sky surveys, and they incorporate machine learning and computational statistics to develop better diagnostics for autism.
In another project, researchers have distributed more than 400 sensors in Pasadena, Calif., and surrounding areas. The sensors collect data on the intensity of shaking during and following earthquakes. Those data are aggregated by cloud computing and subsequently analyzed to predict damage to buildings.
In both of these projects, the analysis techniques -- rather than the observations -- take center stage. Djorgovski emphasizes that there's a big difference between number crunching and data-driven computing.
"When you mimic inside a machine what happens in Mother Nature, the output of simulation is not a formula, but a data set," he says.
"Computing is not a crutch. It's just a different way of going at reality."
It's not surprising that an astronomer would want to develop new ways to apply data science.
Astronomy was "always advanced as a digital field," Djorgovski says, and in recent decades, important discoveries in the field have been driven by novel uses of data.
Take the discovery of quasars.
In the early 20th century, astronomers using radio telescopes thought quasars were stars. But by merging data from different types of observations, they discovered that quasars were rare objects that are powered by gas that spirals into black holes in the center of galaxies.
Quasars were discovered not by a single observation, but by a fusion of data.
To make these kinds of discoveries, astronomers had to recruit computer scientists to help, and the nascent field of astroinformatics gained traction. Astroinformatics, Djorgovski says, is a bridge that connects information to astronomy.
"There is usually knowledge hidden in data sets," Djorgovski says. Getting knowledge out of those data sets, he says, will require new ways of approaching research.
He likens the current challenges of wrangling big data to those that drove the development of statistics in the early 1900s, when scientists created new tools to establish sampling sizes and to use measurements made under different conditions.
"We're coping with the rise of new scientific methodology," Djorgovski says.
"By the late 1990s," he says, "It was very clear that this was something qualitatively different. It's not the same stuff with more data."
Challenges persist, Djorgovski notes, including lack of programming tools, resistance from academic institutions, and a pecking order that does not always reward people who can bridge the gaps between science and computing.
He says he is concerned about a generation of potential scientists with a talent for informatics who are being pushed down by lack of opportunity.
Academia is "crushing young people who can bridge these gaps," he says. He has seen postdocs have trouble getting jobs because they're "not the world's greatest astronomer and not the greatest programmer." Even worse, science is losing these people to commercial ventures in business, technology and finance, which Djorgovski views as an enormous loss to academia.
The problem central to science—and to ensuring that traditional academic institutions can continue to foster discoveries—is whether we can make the best use of the accumulation of information.
"Big data are not about data," Djorgovski says. "It's all about discovery."