Computational Language and Media explores the use of digital methods to assemble, understand, critique, and expand collections of social and cultural data. What are the functions of these methods, and what are their possible uses? How can potential biases be mitigated? In this theme, we develop digital collections of cultural data that holds national significance, making these collections available to researchers, and create novel techniques for analysing and visualising them.

Language Technology and Data Analysis Laboratory (LADAL)
LADAL (pronouced lah’dahl) is collaborative research support infrastructure for computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland. LADAL provides comprehensive resources for language data processing, visualisation, statistical modelling, and text analytics as well as offering guidance on language technology, language data science, and digital research tools.

The resources produced by LADAL range from introductory quantitative research to practical programming tutorials for humanities scholars, data visualisation and advanced statistical modelling guides. This includes machine learning and recently-introduced classification and prediction methods. LADAL also showcases studies on computational lexicography, acoustic analyses of speech, and literary stylistics.

LADAL is currently leading two national collaborative research infrastructure projects supported by funding from ARDC.

Australian Text Analytics Platform Project
The University of Queensland, in collaboration with AARNet and the University of Sydney, are leading the establishment of national research infrastructure to support text analytics in a project co-funded through the ARDC Platforms Program ($759,000). There is currently a significant bottleneck in disciplines that rely on text data (written, spoken, signed, multimodal), both with respect to the transformation of data into machine-readable forms, as much of the raw data is unstructured (i.e. text data processing), and the use of tools for text data analysis and visualisation (i.e. text data mining), including extracting and classifying important social and cultural information from those texts. The Australian Text Analytics Platform (ATAP) will bring together users and providers of text analytics in an integrated, collaborative cloud-based environment in which Australian researchers can work with either their own or existing text data collections and access resources for self-paced professional development in using text analytics.

Language Data Commons of Australia (LDaCA) Data Partnerships Project
The University of Queensland, in collaboration with AIATSIS, ARC Centre of Excellence for the Dynamics of Language, ANU, Monash University, and University of Melbourne, are leading the establishment of national research infrastructure to support research on languages in Australia and its regions in a project co-funded through the ARDC Data Partnerships Program ($500,000). Large collections of language data have been amassed in Australia, but many remain underutilised or at-risk. These collections include intangible cultural heritage (the languages of some of the world's longest continuous cultures) in one of the world's most linguistically diverse regions. The Language Data Commons of Australia (LDaCA) will be a sustainable, long-term repository for ingesting and curating existing language data collections of national significance. This project will open up the social and economic possibilities of Australia's rich linguistic heritage and lay the foundation for the establishment of a broader HASS (Humanities, Arts and Social Sciences) Research Data Commons.

See ARDC spotlight on the Language Data Commons of Australia.


Case Studies

Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis
Public discourse around COVID-19 on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text-mining approaches to analyse responses to major events commonly treat public discourse on social media as an undifferentiated whole, without systematically examining the extent to which that discourse consists of distinct sub-discourses or which phases characterise its development. They also confound structured behavioural data (i.e. tagging) with unstructured user-generated data (i.e. content of tweets) in their sampling methods. This project demonstrates how one might go about addressing these sets of challenges by combining corpus linguistic methods with a data-driven text-mining approach to gain a better understanding of how the public discourse around COVID-19 developed over time and what topics combine to form this discourse in the Australian Twittersphere over a period of nearly four months. By combining text-mining and corpus linguistics, this study exemplifies how both approaches can complement each other productively.

UPCOMING EVENTS

The LADAL 2021 seminar series
This series will introduce LADAL to the Australian and international research community through weekly seminars, lectures, and workshops on language and computational humanities. These will be presented by eminent figures of the field, including Stefan Gries (UCSB, USA), Laura Janda (Arctic University of Norway), Laurence Anthony (Waseda, Japan), Mikko Laitnen (Jyväskylä, Finland), Terttu Nevalainen (Helsinki), Guillaume Desagulier (Paris 8 University, France), Gregor Wiedemann (Leipzig, Germany), Gerold Schneider (UZH, Switzerland), and Natalia Levshina (MPI, Nijmegen).


PAST EVENTS

Conference presentations
Text Mining the COVID-19 discourse in the Australian Twittersphere
Martin Schweinberger, Michael Haugh & Sam Hames at ALS 2020 (2020 meeting of the Australian Linguistics Society). Online, 14-15/12/2020.

Digital skills training in humanities, arts, and social sciences – Webinar video
Michael Haugh, Martin Schweinberger, and Marco Fahmi. 30/10/2019

The Replication Crisis and HASS. How Best Practices can Assist in Producing Reliable Research
Martin Schweinberger at the Open Data Forum. The University of Queensland, Australia, 21/10/2019.

Implementing school-based support infrastructure for digital humanities research at UQ. The Language Technology and Data Analysis Laboratory (LADAL)
Michael Haugh & Martin Schweinberger at the Australian Research Data Commons (ARDC): The Australian eResearch Skilled Workforce Summit. Sydney, Australia, 29-30/7/2019.

Using R for Corpus Linguistics – an Introduction and Discussion Note on Sustainability and Replicability in Corpus Linguistics
Martin Schweinberger at the Center of Excellence for the Dynamics of Language (CoEDL) corpus workshop. Melbourne, Australia, 2–3/4/2019.

Public lectures/Panels/Webinars

Text Mining the COVID-19 discourse in the Australian Twittersphere.
Martin Schweinberger, Sam Hames, & Michael Haugh at the Data Science MeetUp, Brisbane, 10/9/2020.

Open Data Forum. Panel discussion about Open Data plus lightning talk about Open Science and the Replication Crisis and their relevance for HASS.
Martin Schweinberger at UQ, 21/10/2019.