When data science met the library: the 4th British Library Labs Symposium

Last Monday, I had the chance to attend the latest in the series of BL Labs annual symposia. The BL Labs were set up to “support and inspire the public use of the British Library’s digital collections and data in exciting and innovative ways”, and the symposia series is designed to showcase some of the results. I reflected during the day that it might be reasonable to give this symposium a strapline of “data science meets librarianship meets social science meets fine arts”, as can be seen from the programme for the day. It was also notable that, in view of the increasing amount of data available from cultural institutions, including galleries and museums as well as archives and libraries, ‘openglam‘ is now a thing.

The symposium also saw the announcement of the release of a new collection of copies of BL datasets, very diverse in nature, in the hope that their general availability will promote new usage and users.

Melissa Terras gives her keynote address
The keynote talk was given by Melissa Terras from UCL, who spoke about the use made of the BL’s collection of 60,000 digitised public domain books, by researchers and students. The challenge posed by what are typically very simple searches carried out over massive sets of text data has shown the need for changes in the architecture of the computer systems used. Melissa pointed out the potential for information specialists, including librarians, to offer data handling and analysis alongside more traditional services.
Turing Institute and BL in collaboration
The rest of the day was given over to a series of short presentations, highlighting the very great variety of uses made of the BL’s datasets, and concluding with series of presentations from the Alan Turing Institute, the UK’s national data science institute, headquartered within the BL.

It was very clear from these presentations, that we have gone way beyond producing the ‘clever, shiny things’, that often characterise new digital applications, to a situation where the applications are about to become not merely useful but mainstream. Of the examples presented, I was particularly interested in
• a range of visual analytic methods applied to the British National Bibliography, to show relations between the items
Elastic System an artwork celebrating the nineteenth century librarian Thomas Watts, based on BL collection data
• a number of applications using the BL’s image collection on Flickr, including the intriguing Fashion Utopias, an animation created to accompany a fashion installation sponsored by the British Council
SherlockNet, a system for automatically categorising, tagging, and generating captions for images in the BL Flickr collection. The system uses a neural net, which was trained to categorise images into several main categories – buildings, people, etc. – with more detailed tags and captions derived from words on the pages around the images. This is still a research project, for now; any automatic indexing system using this approach would need to be combined with human annotation; but it gives a very clear indication of what will be possible in the future.

Of the many things I took away from the day, these were my main thoughts (in no particular order):
• Data science is now of becoming of major relevance to library/information science; not in the sense that LIS should become ‘data science-lite’, but rather that data science, like HCI, is an aspect of computer science which is of particular importance, and with which all information specialists should have some understanding. This is particularly so if assisting users with data analyses is going to become part of the repertoire of the information professional.
• Aspects of data science should feature in the core of all LIS education programmes; my own department is doing this in our Digital Information Technology and Applications module on our #citylis courses
• Data literacy needs to attain more importance, whether as a complement to information literacy, or wrapped within some over-arching meta-literacy
• Data ethics needs much more attention, as proposed by Luciano Floridi
• There is a need for more conceptual think around the idea of data (or dataset) as document, as opposed to the more conventional approach of thinking of documents as being made up of data; the insightful chapter by Jonathan Furner is a good start here.

All in all, a very impressive advertisement for the British Library’s move to accommodate digital data firmly within its remit, with credit due particularly to Adam Farquhar, the Head of Digital Scholarship, and Mahendra Mahey, Manager of the BL Labs.

