Yesterday’s sci-fi has become today’s reality. Join us as we venture our way into the ever-growing domain of Language Technology in which we discover and discuss current and future developments in speech recognition, automated literary translation, opinion mining and open domain chatbot applications. Not only do we find ourselves having cheeky chats with clever cars and critical conversations with experts, we also ponder over the pros and cons of Artificial Intelligence and assess our position as linguists (and one alleged Professor of Disco Studies) in view of these developments.
For more information, references and a full transcript please visit wordsandactions.blog
In this episode we start our discussion of language and technology with voice recognition. Bernard mentions a general bias towards female voices, as discussed in this paper:
Edworthy J., Hellier E., & Rivers J. (2003). The use of male or female voices in warnings systems: a question of acoustics. Noise and Health, 6(21): 39-50.
Pitch range is also important, as demonstrated in the experiment on using different voices for sat navs that Erika mentions:
Niebuhr, O., & Michalsky, J. (2019). Computer-generated speaker charisma and its effects on human actions in a car-navigation system experiment: or how Steve Jobs’ tone of voice can take you anywhere. In Misra S. et al. (eds) Computational Science and Its Applications – ICCSA 2019. Lecture Notes in Computer Science, vol. 11620: 375-390. Springer, Cham. https://doi.org/10.1007/978-3- 030-24296-1_31
Moving from acoustics to culture, the following paper discusses how male voices are perceived as more authoritative:
Anderson R.C., & Klofstad, C.A. (2012). Preference for leaders with masculine voices holds in the case of feminine leadership roles. PLoS ONE, 7(12): e51216. https://doi.org/10.1371/ journal.pone.0051216
It is worth sharing a few more auto-captioning gems in the lectures of Veronika and her colleagues at Lancaster University:
"my grammar is leaving me" → "my grandma is leading me"
“n-sizes” → “incisors”
“Hardaker and McGlashan” → “heartache and regression”
“institutional” → "it's too slow" (truth!)
“masculine” → "mass killer" (bit harsh)
On readability, Bernard mentions an example from accounting, namely the obfuscation hypothesis. The following paper on the topic is considered the first accounting study that uses automated textual analysis with a very large sample to address readability:
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting & Economics, 45: 221–247. doi:10.1016/j.jacceco.2008.02. 003
We then go on to talk about sentiment analysis, which is used to find out about, for example, brand perceptions or patient satisfaction. Here is an example of the latter:
Hopper, A. M., & Uriyo, M. (2015). Using sentiment analysis to review patient satisfaction data located on the internet. Journal of Health Organization and Management, 29(2): 221-233. DOI 10.1108/JHOM-12-2011-0129
In the context of this episode, we want to distinguish between corpus linguistics and computational linguistics. Although language corpora are used to train systems in machine learning, corpus linguists engage in the computer-assisted analysis of large text collections, often combining automated statistical analysis with manual qualitative analysis. A company using such mixed corpus linguistic methods to provide their customers with insights about their products and services is Relative Insight. (We did not receive any funding from them for this episode, but they are a spin-off company that started at Lancaster University.)
A critical evaluation of another area of computational linguistics, topic modelling, written by two corpus linguists is:
Brookes, G., & McEnery, T. (2018). The utility of topic modelling for discourse studies: A critical evaluation. Discourse Studies, 21(1): 3-21. https://doi.org/10.1177/
(Incidentally, the above paper is also based on data about patient satisfaction.)
The PhD thesis on automatic irony detection that Bernard mentions was written by Cynthia Van Hee and is available here.
The second interview quest is another one of Bernard’s colleagues from Ghent University, Orphée De Clercq. Her recent publications include:
De Bruyne, L., De Clercq, O., & Hoste, V. (2021). Annotating affective dimensions in user-generated content. Language Resources and Evaluation, 55(4): 1017-1045.
De Clercq, O., De Sutter, G., Loock, R., Cappelle, B., & Plevoets, K. (2021). Uncovering machine translationese using corpus analysis techniques to distinguish between original and machine-translated French. Translation Quarterly, 101: 21-45.
And finally, we talk to Doris Dippold from the University of Surrey in the UK. Her work on chatbots can be found in:
Dippold, D., Lynden, J., Shrubsall, R., & Ingram, R. (2020). A turn to language: How interactional sociolinguistics informs the redesign of prompt: response chatbot turns. Discourse, Context & Media, 37. https://doi.org/10.1016/j.dcm.