How do teachers of Indonesian choose what to teach? Computational tools to explore usage patterns
Abstract
Identified challenges in Indonesian language teaching should be informed by data about the use of language features by teachers and in teaching materials. To develop empirical data of this nature requires investment in natural language processing (NLP) of Indonesian, as well as multilingual resources. NLP for multilingual applications is necessary in order to process data with a mix of the
language of instruction (English in this study), as well as the target language of 'Indonesian', which may include other languages of Indonesia given the complex Indonesian linguistic ecosystem.
This transdisciplinary thesis, mainly in Applied Linguistics, developed NLP methods to uncover computational insights into Indonesian language teaching. As transdisciplinary work, the thesis examined the challenges and risks of using language technologies with language teaching data and for Indonesian language teaching specifically. Three complementary datasets were collected: 1. live recordings of a semester of Indonesian taught at a tertiary level, 2. NLP-assisted analysis of the well-known textbook The Indonesian Way, and 3. YouTube lessons from three teachers of Indonesian. Working with Elpis, an Automatic Speech Recognition (ASR) pipeline, Dataset 1 (included approximately 150 minutes of hand transcribed multilingual data) and Dataset 3 (included 1 hour 35 minutes of hand transcribed multilingual data, as well as 2 hours and 49 minutes of machine transcribed data) formed the basis of experiments to accelerate transcription of spoken data to text (or speech-to-text transcription) via machine learning. Machine accelerated processing widened the transcription 'bottleneck' encountered in usage-based research on spoken language.
Analysis of all three datasets uncovered useful information about the use of pronouns in spoken language and text resources, with in-person spoken language and textbook materials demonstrating a restricted use of first person pronouns, and YouTube teaching exploring a wider range of possibilities while explicitly challenging standardised forms as the sole targets for acquisition. Results forming this thesis by compilation were published in highly competitive venues as six peer-reviewed papers (five as first author), with a further first author chapter in preparation for future peer-review in an applied linguistics venue. The thesis also released data resources, language models and computational tools to encourage further analyses. Driven by both professional and research questions, this thesis offers insights into teaching practices and applications of language technology in education.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description