Skip to main content

Using the public to work on scientific data

As scientists, we sometimes deal with an overwhelming amount of data. Compared to other areas of research, I know that I have not worked on projects that come close to producing the largest set of numbers. Nevertheless, I have felt confined by my lack of knowledge in computations when looking for the best models to fit my data and making predictions based on my research.  

Perhaps the person conducting the experiments and collecting the data is not the best person to get the most out of that data. This doesn't just apply to science. Even Netflix needed outside help to improve their accuracy of making predictions about how much people will enjoy a movie based on their movie preferences. Netflix opened up their data handling issues to the public in a competition that awarded $1 million to the winners a few years ago.

Crowdsourcing, outsourcing jobs to the public, is gaining popularity in and outside of science. For scientists, companies like Kaggle can help organize data prediction competitions for a fee. So far, the outcomes have been good. For example, in a competition for predicting how patients with HIV will respond to a cocktail of antiretroviral drugs based on patients' DNA sequences, the Kaggle winner created a model that is 78% accurate. That's significantly better than the existing model, which had 70% accuracy.

The public seems capable of manipulating data even better than researchers. Perhaps the lack of being confined to the dogma of a particular field enable people to be more creative in finding solutions.

Crowdsourcing is promising, but there are also potential problems. To use crowdsourcing, scientists must release their data to the public. This could be tricky, especially if your work involves patient confidentiality, or if you're worried about being scooped. While crowdsourcing might not work for every project, it's still an option that the science community should keep in mind. The diverse range of talents outside of the scientific community can be an immensely useful tool for scientists.

Related Links: