(HOBAR) Open University History of Books and Reading Seminar
25 March 2019, 5.30pm - 7.30pm
Room 234, Second Floor, Senate House, Malet Street, London WC1E 7HU

Speaker: Prof Michaela Mahlberg, University of Birmingham

CLiC 2.0 – corpus linguistics and the digital humanities 

While corpus research has traditionally focused on non-literary texts, there has been increasing interest in the study of fiction, which is often covered under the umbrella term ‘corpus stylistics’ (Semino and Short 2004). In order to be able to account as fully as possible for features of literary texts we need to create new tools and develop methodologies that are tailored to the task at hand. There are numerous digital humanities tools for the study of fiction, but similarities and overlap with corpus linguistic concerns are rarely brought to the fore. In this paper, I will illustrate key functionalities of the web application CLiC (http://clic.bham.ac.uk/) and its latest release CLiC 2.0 (released in March 2019). CLiC has been specifically designed for the corpus linguistic study of narrative fiction. The CLiC corpora comprise over 140 books and 16 million words across four subcorpora: the corpus of Dickens’s Novels, the 19th Century Reference Corpus (19C), the Corpus of 19th Century Children’s Literature (ChiLit) and the Corpus of Additional Requested Texts (ArTs). For all CLiC texts, direct speech and specific places around speech have been marked up (Mahlberg et al. 2016). Hence, CLiC can run searches across defined textual subsets and support the analysis of features of narrative fiction. An important question is how a range of  features and patterns in fiction can be brought together in a coherent theoretical framework. The search for such a framework also highlights where corpus linguistics and the digital humanities can come more closely together. My suggestions will focus on a lexically-driven approach that can account for fictional worlds while at the same time highlighting the fuzzy boundaries between fiction and the real world.


Mahlberg, M., Stockwell, P., Joode, J. de, Smith, C., & O’Donnell, M. B. (2016). CLiC Dickens:

novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora, 11(3), 433–463.


Semino, E., & Short, M. (2004). Corpus Stylistics. Speech, Writing and Thought Presentation in a Corpus of English Writing. London: Routledge.

