Big Data and Human Rights, a New and Sometimes Awkward Relationship

Despite their exciting potential to uncover human rights abuses, technologies that collect and analyze huge amounts of data can also infringe on other human rights. The AAAS Science and Human Rights Coalition explored the way forward.
The GDELT project monitors the world's broadcast, print, and web news in over 100 languages and can identify human-rights related events before the news appears in mainstream, Western channels. This visualization shows all news events captured from 1979 to 2013. | Gdelt.org

Even as experts at a recent meeting showed how big-data technologies can reveal and document human rights abuses like child sex trafficking, they and others in the audience were considering the implications for privacy, free expression, and other human rights.

"The application of big data in the human rights domain is still really in its infancy," said Mark Latonero, research director and professor at the USC Annenberg Center on Communication Leadership & Policy and fellow at the Data & Society Research Institute. "The positives and negatives are not always clear and often exist in tension with one another, particular when involving vulnerable populations."

Latonero spoke at the 15-16 January meeting of the AAAS Science and Human Rights Coalition, a network of scientific and engineering membership organizations that recognize a role for scientists and engineers in human rights. 

"Big data" conventionally refers to the collection, storage, and analysis of huge amounts of data. Although there are many sources of new kinds of digital data, the bulk is created — sometimes intentionally, sometimes not — when people use the Internet and their mobile devices.  According to a 2014 White House report, more than 500 million photos are uploaded and shared every day, and more than 200 hours of video are shared every minute. People also leave a trail of "data exhaust" or "digital bread crumbs" when they shop, browse, and interact digitally.  This information is collected, stored, and analyzed, sometimes after being sold, for marketing and other purposes including scientific research. 

The White House report notes: "Used well, big data analysis can boost economic productivity, drive improved consumer and government services, thwart terrorists, and save lives." But, these benefits must be balanced against the social and ethical questions these technologies raise, the report continues. These types of tradeoffs were at the forefront at the AAAS meeting, where several participants raised questions about property rights and data ownership.

Speakers including Emmanuel Letouzé, who is also the cartoonist Manu, described big data and petroleum as commodities that can benefit society while also creating power imbalances. | Emmanuel Letouzé

Cell phone, insurance, credit card, and other companies collect personal data about their customers that could be used for altruistic as well as business purposes, but "my take is, it's not their data to start with," said Emmanuel Letouzé cofounder and director of the think-tank Data-Pop Alliance. The notion of "data philanthropy" may be flawed at the outset, he said, adding that for citizens in developing countries, any benefits they may receive must be weighed against the risks of providing personal information used in big data analysis projects, especially in regions with a history of political instability and ethnic violence.

Several participants said they didn't necessarily mind sharing their data, for example with social media companies or Amazon.com. However, they wanted more transparency and a better understanding of how their personal data was being used. Likewise, a recent Pew survey reported that nine out of 10 respondents said they felt consumers have lost control over how companies collect and use their personal information.

"The most fundamental human right, I think, is being able to weigh in on what constitutes a harm," said Letouzé, arguing that citizens should have a much greater say in how their data is used. His colleague, MIT professor and Data-Pop Alliance academic director Alex "Sandy" Pentland, has called for a "new deal on data," a set of workable guarantees that the data needed for public goods are readily available while, at the same time, protecting personal privacy and freedom.

Even in more stable, developed countries, big-data technologies can potentially be used for discrimination and manipulation, argued Jeramie Scott, national security counsel at the Electronic Privacy Information Center (EPIC). He disagreed with the White House report's recommendation that big-data policies focus primarily on how the data is used: "Data collection can have a chilling effect" on the rights to self-expression and free association, he said. People may censor themselves or their activities online, he proposed; for example, they may fear discrimination by companies evaluating their credit-worthiness.

Top: Patrick Vinck, Harvard Humanitarian Initiative; Mark Latonero; Megan Price; Kalev Leetaru, GDELT; Bottom: Emmanuel Letouzé, Samir Goswami, Jeramie Scott | AAAS

Other speakers described projects that use big data in ways that directly support human rights — but even they felt caution was needed.

Latonero showed how analyzing classified ads can reveal patterns suggesting organized child sex trafficking and even investigate particular individuals. Corporations such as Western Union, Google, and J.P. Morgan Chase are also analyzing data that can reveal financial transactions or other evidence of human trafficking. When this data is shared with human rights groups and researchers, it brings up yet-unanswered questions about who has a responsibility to act if a human rights abuse is uncovered, and who has the responsibility to report and monitor that situation, Latonero said.

Samir Goswami, director of Government Professional Solutions at LexisNexis, described a pilot product called SmartWatch that scans over 26,000 information sources each day and alerts a client company or government entity when it finds indications of societal risks somewhere in its supply chain, including for human rights. And, the GDELT project, which monitors the world's broadcast, print, and web news from every country in over 100 languages, can show when human-rights related events are being reported well before the news makes its way through mainstream, Western channels.

Nonetheless, as researchers who work with human-rights related evidence already know, even large datasets must be checked for biases, such as the omission of key facts. "Big data, while promising, interesting, and useful, is not synonymous with complete or representative data," said Megan Price, director of research at the Human Rights Data Analysis Group.

Consent is another complex issue. When Goswami worked at Amnesty International USA, he partnered with DataKind, who convened a group of data scientists to analyze a 30-year archive of Urgent Action bulletins that contained information about prisoners of conscience, detainees, and other individuals whose human rights were being threatened. The scientists developed a pilot method to predict human rights risks, and the bulletins will be organized in a publicly searcheable database by Purdue University.

Even when data science is harnessed for the public good, Goswami noted, the widespread dissemination of identifying information does have implications for the informed consent. For example, even if individuals consent to have their data collected for one purpose, they may not be aware at that time of other ways that data might be used in the future. Goswami agreed with an audience member that data scientists could learn from the field of medical ethics and the institutional review boards that oversee issues such as informed consent in clinical research.

Human rights experts and data scientists must continue to talk to each other, all agreed. "The thing that keeps me up at night is data scientists trying to intervene in human rights issues with no context of the human rights issue, and then human rights professionals using big data without examination of the assumptions around that data," said Latonero.