Documenting Babel – languages in information science

Tower of Babel - Pieter BreugelMusing on the continuing place of language issues in both research and practice in the information sciences, following my participation in the Zagreb InFuture meeting, I wrote an editorial on the topic for the Journal of Documentation. This post is an amended version.

Languages, in one guise or another, have been a constant feature of the landscape of the information sciences for many years.

There are, for example, the various artificial ‘languages’ – more usually thought of as notations, nomenclatures or ontologies – which have been devised to describe such things as chemical structures and reactions, medical diagnoses and treatments, and the burgeoning data-rich fields of modern biology. There is the presence of linguistics as a subject of seemingly perpetual potential relevance to the information sciences. As Sparck Jones and Kay (1973, p. 1) put it in their seminal textbook: “linguistics and information science are natural bedfellows … but there has been relatively little contact between the two fields”: the situation has not changed much in the intervening decades. There is the now ubiquitous searching of ‘full text’ databases, requiring a greater or lesser amount of ‘intelligent’ processing of the natural languages in which the content of such databases are couched.

But primarily, there is the continued need for handling communication of information in all of the world’s languages. Neither the earnest advocacy of ‘universal’ or ‘auxiliary’ languages, from Leibnitz’ logic-based characteristica universalis to Esperanto, nor the long-anticipated advent of English as a de facto global language (Crystal 2003), has reduced the demand for support for national and local languages, as the provision for 23 official languages in the European Union testifies.

This naturally has consequences for research and practice in the information sciences. A facility with languages other than one’s own has always been one of the requirements of the practising librarian and information officer, even in the traditionally language-averse United Kingdom. Sadly, the requirement for some facility with two languages other than English, a requirement when I studied information science at Masters level, has long gone from the UK, though an equivalent requirement is still largely present in Continental Europe. This manifested itself in a variety of detailed language tools for the information professions, Allen’s 1975 Manual of European Languages for Librarians, being a typical example.

In research terms, language issues have stimulated work on a variety of topics. An early example was the study of the value of ‘cover-to-cover’ translations of scientific journals, particularly from the Russian language following the shock to the Western scientific complacency caused by the Sputnik satellite of 1957 (Tybulewicz 1970). Other long-standing concerns, in the English-speaking world at least, were focused on the ‘language barrier’, the belief that valuable information, particularly in scientific, technical and medical subjects, was being missed because it was not published in the English language (see, for instance, Hutchins, Pargeter and Saunders 1971, Chan 1977, Thorpe, Schur, Bawden and Joice 1988). More recently, attention has been focused on such topics as the information practices of translators, natural language processing and cross-language information retrieval; Some examples of recent Journal of Documentation articles reporting such research, as an indication of its variety, are shown below.

These thoughts were stimulated by my attending the INFuture conference in Zagreb, Croatia, in November. A substantial proportion of this conference, which dealt with the future of information science, was devoted to language technologies – including machine-aided translation and natural language processing – and to languages issues in general. The topics covered included the European Union’s CLARIN (Common Language Resources and Technology Infrastructure) project, which aims to compile a series of digital archives with data sources for language-based materials (text and speech corpuses, dictionaries, etc.) together with language and speech technology tools. Particularly aimed at academic users in the arts and social sciences, CLARIN adopts the philosophy that all languages – irrespective of the number of speakers or of their commercial importance – are of equal importance.

It seems clear that the predictions, or fears, of the adoption of artificial languages, and of the ubiquitous adoption of any single one, are very far from fulfilment. We may expect that these issues will be an important feature of the information research agenda for the foreseeable future.


3 thoughts on “Documenting Babel – languages in information science

  1. CLARIN’s philosophy that all languages – irrespective of the number of speakers or of their commercial importance – are of equal importance is the right one.

    I’ve long been a user and advocate of Esperanto, but I do see the importance of all languages as well as the importance of the information scientist role in making accessible what is produced in them.

    Long may the world’s rich linguistic heritage endure. The aim of Esperanto was and is only to act as an auxiliary language and not to displace other tongues.

  2. I agree with the comment of Sparck Jones and Kay that “linguistics and information science are natural bedfellows … but there has been relatively little contact between the two fields”. The same is as true today as it was then.

    The choice of a common language is dependent not primarily on the needs of the information community, but, sadly, on power politics. The outcome of the language war will depend, more than anything, on the outcome of the current wars of domination in the Middle East.

    Serious research is needed into the dynamics of language politics and its link with political propaganda. My own research suggests that an important part of language wars is the undermining of the opposition, just as in any other form of conflict. Very little seems to be published on this.

