One of the most important roles of local journalists is holding councillors to account. Councillors are paid a pretty decent sum by the taxpayer (between £13,000 and £14,000 per annum in most councils), with group leaders, leaders of the opposition, committee chairs, cabinet members, and council leaders earning significantly more (Ceredigion County Council’s leader is paid £43k per annum).
One of their most fundamental responsibilities is to attend meetings and represent their constituents.
Exposing poor attendance
Prior to my arrival at the Cambrian News, my colleague Chris Betteley had exposed a councillor for attending just five
full council meetings in 22 months. The story helped him bag one of three awards at the Wales Media Awards 2016, including Journalist of the Year. Since Chris had to ask the council for the attendance records, the council’s response once the story broke was to commit to publishing annual attendance records on their website.
So I was amazed to see, on inspecting the councillors’ attendance figures in June, that the same councillor had been to just five out of 17 full council meetings in the year since Chris exposed him. Of course, I wrote a story about it, but that didn’t seem enough – I wanted all residents in Ceredigion to look at a map and see what their councillors’ attendance was.
Scraping the data
James Gardiner, Data Developer at Nesta, suggested I try the mapping tool called Carto, which worked really well, and Glyn Mottershead, Senior Lecturer in Computational and Data Journalism at Cardiff University, helped me with the scraping of the data. This was what I found most difficult because of the way the council had presented the data. I learned the basics of data scraping using import.io at Hack the Local, but this was above me, so Glyn used python to scrape the data, after which I tidied it up ready to use for the map.
Glyn says:
“The biggest problem with the information that Caleb needed to work with is it goes across a number of pages – a pain to copy and paste but automating the information is much easier.
There are tools like Outwit Hub and import.io that are well worth a look, but I have a few Python scrapers on file that I’ve written for other projects and was able to adapt one for Caleb’s story.
The Python code was written to go to the home page for the councillors and follow each of the web links it found – essentially clicking through to the pages of the councillors – and then grab the information from each page to create a spreadsheet which could be uploaded to visualise the information.
The first step in scraping is to look at your target pages in the raw – using the inspect element option which shows the page as a computer sees it in raw code (I also like Firebug if I’m using Firefox). We can then find the bits we need to isolate. If you would like to find out more, I’ve written about the modules used in detail so you can replicate the steps..
Most of the work was done by a Python module called Beautiful Soup (BS4), which can handle text within a web page and means you can write instructions to select certain bits of HTML or segments by name. Everything we needed was in a table on the index page named “ms-rteTable-1”. So using BS4’s SoupStrainer I isolated the table and found all the web links to the individual pages.
This was added to a set (a kind of list used in computing but it doesn’t allow duplicates – really useful so we aren’t repeating our information if the web link is there twice), which became the base of the second phase.
The next step was to then understand how the councillors’ pages worked and the structure used to order the information.
Using the inspect element option I was able to isolate where the councillor’s name, ward and the details Caleb needed were by the name of the HTML elements that stored the text. The biggest issue was that whoever built the site did a bad job of using names that make life easier for someone writing a scraper.If a distinct naming convention is used it is good to go, but this one replicates names, not impossible to solve but it means more code to make sure we get what we need from the right place.
A few test drives later and it worked – Python is pretty good at telling you you’ve gone wrong and where, the work is often around finding out why!
It did mean that given the time frames the code wasn’t as elegant as it could have been and didn’t solve all Caleb’s problems, but did give him something to work from for the news piece.”
Mapping the data
Now that I had my attendance data, I needed to find boundary data. I wanted data at ward level within the county. Thanks to Martin Chorley, Computational and Data Journalism Lecturer at Cardiff University, this was easy to find and download at his GitHub website.
This comes to the most crucial part of the process: merging the data sets. Remember, when you merge the two datasets (boundary and councillor attendance), your ward names will need to be exactly the same in both spreadsheets or the data will not merge properly.
Using Carto is pretty simple from here but these are the steps I took:
- Add datasets. Once all your datasets are ready to go, upload them separately to Carto by going to ‘my datasets’ and clicking on ‘new dataset’. Note the title of the matching columns in both datasets – should be ward names – for use later. For example: wd13nm and ward.
- Start map construction. Go to ‘my maps’ and click on ‘new map’. Select the boundary data. The map should now show with all your council’s ward boundaries in default colour fill.
- Merge datasets. At the top of the window, click on ‘data view’. On the side panel, click on the ‘merge datasets’ tab and then click on the ‘column join’ option when it takes you through.
- Choose merge columns. On step 2 of 3, choose the two separate datasets you want to merge from the dropdown menus. Then you select the matching columns noted in 1 and click ‘next step’.
- Choose what to add. On the boundary data side of this window, only choose the geometry. On the attendance side, you can choose as much or little as you want. For example, I found the html code to thumbnail images of each councillor and put these in columns corresponding to each councillor. The other fields I added was ward, cllr name, party/group, cabinet member attendance (if applicable), and committee membership and attendance. When you are ready, click ‘merge datasets’.
- Personalise/Customise. Click on ‘Map View’. Carto suggested a colour scheme based on party, which I chose to use. Your first step should be to click on the ‘info window’ tab on the right. Select what you want to display when someone clicks on the ward and move the fields up and down depending on what order you want them to appear. For the fields where you want to change the name of the title, click ‘edit title labels’. The next step should be assigning colours to your party in the ‘legends’ tab. One problem I experienced was with wards with more than one councillor. My solution was not ideal but I think it works pretty well, and that is to put both councillors’ data into the same cell on the original spreadsheet, but to make clear which data relates to which councillor. Before publication, add titles and credits for the data.
- When you think you are ready, click ‘visualize’ in the top right corner followed by ‘ok create map’. Then, click ‘publish’, again in the top right corner. Select whether you want to embed, share, or even add to an app, et voila.
Don’t panic if the data is slightly out in the map. If you need to go back into the original dataset, just save it under a new number and then upload to Carto again. I experimented a lot – it is very much a trial by error process for first-timers.
Here’s the final result: How does your councillor’s attendance record stack up?
Have you been inspired to use data journalism to hold your local council to account? We’d love to hear about it, tweet us at @C4CJ.
Homepage image accompanying this article is copyright nigelpepper.