The Use Of Big Data In Investigative Journalism

PC Network Solutions Business


Paradise papers describe the leak of over 13.4 confidential documents that reveal offshore investments in exotic locations like the Bahamas and Bermuda. These documents were leaked to journalists Frederick Obermayer and Bastian Obermayer, which they got from the newspaper company Suddeutsche Zeitung.

The documents contained details of more than 120,000 people and companies. The data leak, which was 1.4 terabytes in size is the second biggest data leak in history, bested only by the Panama leak of 2016.

Although this was a juicy leak, if considered from a journalistic point of view, it presented challenges that got journalists and Tech Support in West Palm Beach confused. Considering the enormity of this information, it would take a journalist or Tech Support in West Palm Beach months, if not years, to be able to sift through these documents. The leak contained thousands of loan agreement, trust deeds, financial statement and emails, all of this going as far back as five years.

The gossip-friendly portion of the leak was the section that was first dissected and digested. This generated series of damning and controversial headlines relating to celebrities and companies around the world. Some of those mentioned included Prince Charles, Queen Elizabeth II, President of Colombia Juan Manuel Santos and U.S. Secretary of Commerce, Wilbur Ross.

But carrying out a more thorough analysis, one that involves an overlap of investigative journalism and financial fraud investigation, requires big data software. The leaked documents were transferred over to the International Investigative Journalism Consortium (IIJC), which in turn invited the Linkurious; a French software company that has specialization in analysis software and visualization, in cases relating to cyber-security and money laundering. This company has experience working with government agencies and journalists on investigation similar to the Paradise Paper. One of their previous cases was the Swiss Leaks of 2015.

The responsibility of Linkurious software was to device a means by which complex connected data can be identified and analyzed to generate results that are easily understandable and accessible to non-tech savvy journalists. The visual part of the analysis was provided by Neo4j, a graph database, which is a widely used open-source product.

The investigation had begun with raw unstructured bits of data, which were not readable by a machine. This information needed to be organized in a way that investigation could be automated and predefined-data model could be used. The challenge of cross-referencing and highlighting connections was brought about by the fact that the data was kept in silos.

The information used by ICOJ reporters came from the leak and from public databases, which made it important for the silo data to be connected.

With the use of graph technology investigators, analysts, small businesses, and reporters are now able to manage the complexity of data-driven investigators, like the one presented with the Panama papers or Paradise papers.