Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

How do teachers of Indonesian choose what to teach? Computational tools to explore usage patterns

Loading...
Thumbnail Image

Date

Authors

Maxwell-Smith, Zara

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Identified challenges in Indonesian language teaching should be informed by data about the use of language features by teachers and in teaching materials. To develop empirical data of this nature requires investment in natural language processing (NLP) of Indonesian, as well as multilingual resources. NLP for multilingual applications is necessary in order to process data with a mix of the language of instruction (English in this study), as well as the target language of 'Indonesian', which may include other languages of Indonesia given the complex Indonesian linguistic ecosystem. This transdisciplinary thesis, mainly in Applied Linguistics, developed NLP methods to uncover computational insights into Indonesian language teaching. As transdisciplinary work, the thesis examined the challenges and risks of using language technologies with language teaching data and for Indonesian language teaching specifically. Three complementary datasets were collected: 1. live recordings of a semester of Indonesian taught at a tertiary level, 2. NLP-assisted analysis of the well-known textbook The Indonesian Way, and 3. YouTube lessons from three teachers of Indonesian. Working with Elpis, an Automatic Speech Recognition (ASR) pipeline, Dataset 1 (included approximately 150 minutes of hand transcribed multilingual data) and Dataset 3 (included 1 hour 35 minutes of hand transcribed multilingual data, as well as 2 hours and 49 minutes of machine transcribed data) formed the basis of experiments to accelerate transcription of spoken data to text (or speech-to-text transcription) via machine learning. Machine accelerated processing widened the transcription 'bottleneck' encountered in usage-based research on spoken language. Analysis of all three datasets uncovered useful information about the use of pronouns in spoken language and text resources, with in-person spoken language and textbook materials demonstrating a restricted use of first person pronouns, and YouTube teaching exploring a wider range of possibilities while explicitly challenging standardised forms as the sole targets for acquisition. Results forming this thesis by compilation were published in highly competitive venues as six peer-reviewed papers (five as first author), with a further first author chapter in preparation for future peer-review in an applied linguistics venue. The thesis also released data resources, language models and computational tools to encourage further analyses. Driven by both professional and research questions, this thesis offers insights into teaching practices and applications of language technology in education.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

abcd