The computational humanities explores and expands the use of digital methods to assemble, explore, understand and critique collections of social and cultural data. What are the functions of these methods, and what are their potential uses? How can potential biases be mitigated? In this theme we develop digital collections of cultural data of national significance, making those collections available to researchers, and create novel techniques for analysing and visualising them.

 

Language Technology and Data Analysis Laboratory (LADAL)
LADAL (pronouced lah’dahl) is collaborative research support infrastructure for computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland. LADAL provides detailed resources for language data processing, visualization, statistical modeling, and text analytics and offers guidance on matters relating to language technology, language data science, and digital research tools.

The resources produced LADAL range from introductions to basic concepts of quantitative research or practical tutorials on programming for humanities scholars, over data visualization and advanced statistical modelling guides - including machine learning and recently introduced classification and prediction methods - to showcasing studies on computational lexicography, acoustic analyses of speech, and literary stylistics. 

LADAL is currently leading two national collaborative research infrastructure projects supported through funding by ARDC.

 

Australian Text Analytics Platform Project ($1,339,000, 2021-2023)
The University of Queensland in collaboration with AARNet and the University of Sydney are leading the establishment of national research infrastructure to support text analytics in a project co-funded through the ARDC Platforms Program ($759,000).  There is currently a significant bottleneck in disciplines that rely on text data (written, spoken, signed, multimodal), both with respect to the transformation of that data into machine-readable forms, especially as much of the raw data is unstructured (i.e. text data processing), and the use of tools for text data analysis and visualisation (i.e. text data mining), including extracting and classifying important social and cultural information from those texts. The Australian Text Analytics Platform (ATAP) will bring together users and providers of text analytics in an integrated, collaborative cloud-based environment in which Australian researchers can work with either their own or existing text data collections, and access resources for training themselves in how to use text analytics.

 

Language Data Commons of Australia (LDaCA) Data Partnerships Project ($856,000, 2021-2023)
The University of Queensland, in collaboration with AIATSIS, ARC Centre of Excellence for the Dynamics of Language, ANU, Monash University and University of Melbourne are leading the establishment of national research infrastructure to support research on languages in Australia and its region in project co-funded through the ARDC Data Partnerships Program ($500,000). Large collections of language data have been amassed in Australia but many remain under-utilised or at risk. These collections include intangible cultural heritage of the languages of some of the world's longest continuous cultures in one of the world's most linguistically diverse regions. The Language Data Commons of Australia (LDaCA) will be a sustainable long-term repository for ingesting and curating existing language data collections of national significance. This project will open up the social and economic possibilities of Australia's rich linguistic heritage, and lay the foundation for the establishment of a broader HASS (Humanities, Arts and Social Sciences) Research Data Commons. 

See ARDC spotlight on the Language Data Commons of Australia: https://ardc.edu.au/news/a-national-language-data-commons-for-australia/

Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis

Public discourse about the COVID-19 that appears on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text mining approaches to analyse responses to major events commonly treat public discourse on social media as an undifferentiated whole, without systematically examining the extent to which that discourse consists of distinct sub-discourses or which phases characterize its development. They also confound structured behavioural data (i.e. tagging) with unstructured user-generated data (i.e. content of tweets) in their sampling methods. This project demonstrates how one might go about addressing both of these sets of challenges by combining corpus linguistic methods with a data-driven text-mining approach to gain a better understanding of how the public discourse around COVID-19 developed over time and what topics combine to form this discourse in the Australian Twittersphere over a period of nearly four months. By combining text mining and corpus linguistics, this study exemplifies how both approaches can complement each other productively.

Role of translation technologies in learning languages

https://theconversation.com/translation-technology-is-useful-but-should-not-replace-learning-languages-85384 

 

FUTURE EVENTS

The LADAL 2021 seminar series (see https://slcladal.github.io/news.html) will introduce LADAL to the Australian and international research community through weekly seminars, lecture, and workshops on matters related to language and computational humanities by eminent figures of the field, including Stefan Gries (UCSB, USA), Laura Janda (Arctic University of Norway), Laurence Anthony (Waseda, Japan), Mikko Laitnen (Jyväskylä, Finland), Terttu Nevalainen (Helsinki), Guillaume Desagulier (Paris 8 University, France), Gregor Wiedemann (Leipzig, Germany), Gerold Schneider (UZH, Switzerland), and Natalia Levshina (MPI, Nijmegen).

 

PAST EVENTS

Conference presentations

Text Mining the COVID-19 discourse in the Australian Twittersphere. Martin Schweinberger, Michael Haugh & Sam Hames  at ALS 2020 (2020 meeting of the Australian Linguistics Society). Online, 14-15.12.2020.

The Replication Crisis and HASS. How Best Practices can Assist in Producing Reliable Research. Martin Schweinberger at  the Open Data Forum. The University of Queensland, Australia, 21/10/2019. 

Implementing school-based support infrastructure for digital humanities research at UQ. The Language Technology and Data Analysis Laboratory (LADAL). Michael Haugh & Martin Schweinberger at the Australian Research Data Commons (ARDC): The Australian eResearch Skilled Workforce Summit. Sydney, Australia, 29-30/7/2019. 

Using R for Corpus Linguistics – an Introduction and Discussion Note on Sustainability and Replicability in Corpus Linguistics, Martin Schweinberger at the Center of Excellence for the Dynamics of Language (CoEDL) corpus workshop. Melbourne, Australia, 2–3/4/2019.
 

Public lectures/Panels/Webinars

Text Mining the COVID-19 discourse in the Australian Twittersphere. Martin Schweinberger, Sam Hames, & Michael Haugh  at the Data Science MeetUp, Brisbane. (10/9/2020)

Open Data Forum. Panel discussion about Open Data plus lightning talk about Open Science and the Replication Crisis and their relevance for HASS with Martin Schweinberger at UQ. (21/10/2019)

 

  • Katie Brennan
  • Peter Crosthwaite
  • Cedric Courtois
  • Michael Haugh
  • Erich Round
  • Martin Schweinberger