Data crunching and mapping may sound like it belongs in a profession very different to journalism. But they are the tools of data journalism – the new addition to investigative reporting, writes Sharlene Sharim in a story first published in The Media magazine.
When investigative journalists Julian Rademeyer, Jacques Pauw and Andrew Trench broke the story on prospecting licences and the fight over mining rights in November 2010, they made history.
Published in the City Press, the article reveals the number of prospecting licences the ANC’s investment and fundraising company, Chancellor House Mineral Resources, acquired in recent years. It questions the ruling party’s entwinement in the economy. This story surfaced amid intense debate over the possible nationalisation of South Africa’s mines. And, while it is an excellent piece of investigative journalism, it is also described as ‘the first significant piece of data journalism’ in our country, according to Wits University fellow in investigative journalism, Margaret Renn.
Trench, national editor of investigations at Media24, wrote in his blog at the time the story was released that it “did not come about from a deep-throat informant, as much investigative reporting in South Africa does”. Instead, the story emerged from analysing more than 1.5 million items from the Department of Mineral Resources’ (DMR) database of mineral rights.
So, in a country where much investigative journalism is based – as Trench says – mostly on someone passing information on, this is an alternate avenue for journalists and an exceptional way of getting phenomenal stories that would otherwise remain hidden in databases.
The data for the City Press story was simply found on the DMR’s website. The department published records of all the mining and prospecting right applications made in South Africa in October 2010. This PDF document contains, among other things, the names of applicants and the longitude and latitudes coordinates for each site.
Trench says that, although he was able to extract the text from the files, he couldn’t do anything practical with them in Excel, as the size of the dataset would crash the programme. So he used a powerful programming language that he taught himself. “I wrote a custom Python script to search through all the records, find the ones we wanted and collate them into individual files. Then I cleaned up the smaller data sets of results we were interested in, using things like Excel and Google Refine.”
In his blog, he says: “Chancellor House was an obvious search since the entity had already declared an interest in the mining sector, but even so, I was surprised at the number of records which a query produced.”
Because the database contained the coordinates of the sites, Trench was also able to visualise the information. With the help of an online mapping tool called Scribblemaps, he pinpointed the locations of the prospecting rights that Chancellor House had obtained.
The rest, as they say, is journalism history. There is usually some data that begs to be mapped or analysed to provide a new picture, but it isn’t always obvious to those working on the story.
Renn recalls the LeadSA national poll on Primedia radio stations, in which listeners were asked whether they had bribed a metro police officer in recent months. From this, the journalists created a database of information that provided a fascinating story of massive corruption.
“But,” says Renn, “it begs for a map that says ‘watch out, avoid these crossroads, avoid those traffic lights, because these are the places where you’re most likely to be bribed’. I mean, you could have such fun with it.”Renn makes the point that, as incredible as data journalism is, it can’t exist on its own and is meant to be used in conjunction with all the usual techniques of investigative journalism.
Stefaans Brümmer, managing partner of the Mail & Guardian’s Centre for Investigative Journalism, and his team also incorporate principals of data journalism in their investigations.
When allegations of corruption surrounding government’s R30-billion arms deal surfaced, they literally followed the money, studying the transactions made. Brümmer says that these financial flows fed into their stories were all drawn from a database.
So, in the past, the most journalists were doing with data was finding the obvious or getting professionals to help analyse it. Now, they’re doing it for themselves.
Despite involving “quite a lot of technical skill”, Trench believes data journalism is important because, he says, “it opens up a whole new world of journalism for us to experiment with.” As it requires a learned skill that most don’t have, Trench has made a commitment to developing data journalism in his own newsroom.
The point is, as Renn and Tom Johnson – founder and co-director of the – testify, data journalism enables journalists “to get fresh new topics to write on”.
Johnson says there are several reasons why these techniques are still largely unknown. “Traditionally, journalism has been about writing, and it’s been either too difficult or too time-consuming to gather the necessary data and to do the necessary analysis.” He adds: “Although good writing is absolutely necessary, it is not sufficient to bring the necessary information to citizens.”
Johnson believes the digital revolution has brought about important change. Not only do we have much faster access to data, but the tools that allow us to analyse data enable us to do things that weren’t possible too long ago. “Thanks to computing, we can now transfer information to a map with the click of a button.”
Despite these new developments, Trench says we’re still scratching the surface, and Brümmer agrees. “This is one sector where there is lots of scope for improvement,” he says.
The biggest challenge is the expense. Costs are high because, as Johnson points out, it takes time to learn how to use and apply the tools. “Media owners have not invested sufficiently and continuously in training journalists. As a result, the topics covered in journalism are superficial,” he says.
Trench agrees that the barrier to entry is quite high, but believes that because data journalism improves the quality of stories, it will become more mainstream. “In future, editors, newsrooms and journalists will have to invest in their own education to remain relevant as the industry evolves,” he says.