From Mail Merge to Data Driven Press Releases
For the practising hyperlocal or community journalist, annual, quarterly, monthly and weekly statistical releases can provide a reliable source of procedural stories. From weekly accident and emergency statistics from the NHS to monthly job figures from *nomis*, school performance figures to incident level road accident stats, public bodies are publishing ever more data under open data licenses.
Many of these datasets are published at a local level detail but within the context of a national dataset. For the hyperlocal reporter, extracting just the local data required from a possibly large original data set is just the first step. Having got the data, it often needs interpreting or decoding, before any reporting can begin: a simple transmission of the facts, a more elaborate contextualisation comparing figures with previous figures or figures from other regions, or a full blown analysis to identify hidden stories from within the data.
Turning data into stories even of the simplest sort can therefore be a time-consuming business. Even though much of the process of extracting local data and turning into a simple report can become an almost mechanical task for the reporter it falls to, this process is repeated month on month, locale by locale.
Surely there must be a better way?
Many office application users will be familiar with the idea of a *mail merge* in which a individual names, addresses and personal details are merged into a form letter from a spreadsheet or a database. So can we generalise this technique to provide a way of generating localised press-releases that describe, and perhaps even start to contextualise, regular data announcements from national datasets at a local level?
Promoters of robot journalism such as Automated Insights or Narrative Science in the US, Arria NLG in the UK, AX Semantics in Germany and Yseop in France use a range of natural language generation tools to produce human readable text from a variety of data sources at industrial scale. But can we achieve a similar effect in a more home brew way?
As part of a collaboration with hyperlocal blog OnTheWight, I have been exploring how a templated programming approach might be used to generate “press release” style copy from statistical data releases that can provide a human readable summary of spreadsheet released data, turning individual data rows into paragraphs of text. Not so much a robot journalist, more a robot temp.
For example, the following text is a transliteration of a spreadsheet produced by the HSCIC reviewing the numbers of written complaints submitted to, and upheld by, indidividual GP and Dental practices. Each row from the original spreadsheet generates a paragraph, the data contained within separate columns contributes to the construction of each line. In addition, the overall report is based on just the practices associated with a particular local NHS trust.
One the general form of reporting is constructed, reports for any trust can be generated based on just the selection of the corresponding trust code.
Whilst the above example was generated using code, I am also exploring how tools such as Open Refine can be used to produce such reports simply through a custom output template definition.
The next example shows the generation of a localised report on diabetes prescribing – again, one the general case is specified, specific localised reports can be generated at will, this time with a fragment of the code along with the text it generates from the original spreadsheet datas source.
Although presenting a barrier in terms of skills required, using a programming code basis for the report generation allows for the result of wide range of calculations – percentage calculations, or monthly or annual change calculations – and more complex analyses to be embedded in the text. (It also allows for the automated generation of charts from the data.) However, once produced, the code can be shared or used to generate other localised reports on demand.
Whilst producing such tools may not be for everyone, it doesn’t need to be. Once written, the code can be used to feed a data wire that carries custom, localised and automatically generated press releases from an official data release. Having got the data in the press release form by automating out the time-consuming and purely mechanical aspects of pulling local data out of a national spreadsheet, the community journalist is perhaps more likely to be able to make use of such data?
Homepage image accompanying this article is copyright Emilie Ogez.