Data preservation, an essential operation in research

Data preservation is a tough problem to tackle, but it's absolutely necessary for researchers. Scientists learn from one another and must be able to transfer information reliably from one generation to the next. Breaks in data transfer can be inconvenient at best; at worst, it can obstruct projects and waste valuable resources. Time and money went into previous experiments that may not be easily replicated, so it's extremely wasteful to lose historic data.

As illustrated in a recent news story in Science, scientists can make valuable discoveries from old research data when they look at it with a fresh, up-to-date perspective. In the Science story, particle physicist Siegfried Bethke had to hunt down old data across the globe, and in one case, he had to resurrect data that only existed in printouts.

It was actually fortunate that he participated in the original project called JADE, which completed in 1986 at DESY, a high-energy physics lab in Germany. Had he not been part of that work, he probably would've had even a harder time in piecing everything together. When researchers complete a project, they may leave behind disorganized or incomplete supplementary information that's required for other scientists to make sense of the data.

Scientists do have safeguards to prevent this kind of messy data transfer. Ideally, there are lab notebooks and data backup servers. However, notebooks and data backups are not always checked for consistency and thoroughness. Lab members can save data in a variety of different formats and name their files in cryptic manners. Often times, researchers find quick fixes for experimental issues on the fly without recording it for future generations.

Chaotic data handling is a huge impediment to scientific progress. It can force people to reinvent the wheel. You never know when you need to reinterpret old data or compare past results with current ones. Ultimately, the responsibility for data preservation lies with the person in charge of the lab. He or she should take the time to clearly outline how data should be kept and periodically check that the policy is followed. The leader of a lab must sure everyone understands that data, along with all the ancillary information, must be archived.

