Deep Learning Approaches to Text Simplification for English as Second Language (ESL) Readers
Abstract
International university students often face challenges with English language proficiency, which can significantly impact their academic performance. A common issue is the difficulty in comprehending academic content due to limited English reading skills. This PhD thesis in computer science addresses this problem by leveraging natural language processing (NLP) techniques to simplify academic reading materials. The goal is to enhance students' reading comprehension while preserving the essential information in the texts.
To bridge this gap, this thesis explored the extent to which text can be simplified and investigated methods to improve existing text simplification techniques for our target audience. This research was conducted in two phases: case studies and the development of NLP-based simplification approaches.
In the first phase, three case studies were conducted, focusing on both individual needs and linguistic aspects. From an individual perspective, the first case study assessed the reading abilities of 30 students using a set of texts and quizzes with varying levels of conceptual and readability difficulty. This enabled the determination of the optimal level of text simplification required. Additionally, In the second case study, the learning progress of 155 students was tracked by analyzing their performance in a series of semester writing assignments. This provided insights into their learning experiences and helped refine strategies to improve their conceptual reading abilities. On the linguistic side, the third case study investigated correlations between multiple languages were identified, revealing that bi-directional translation could reduce the readability difficulty of the original text.
Building on these findings, text simplification approaches were designed, developed, and evaluated. The first contribution was a sentence compression method that extracts key information from sentences. This approach significantly outperformed existing methods, achieving a processing speed more than 10 times faster than recurrent neural network-based methods while maintaining comparable accuracy. The work was then extended to academic content by creating a dataset of 1,876 ACL papers for lexical simplification. A complex word identification approach tailored to the academic genre was developed, enabling the detection of challenging vocabulary. Finally, the scope was expanded to simplify the lexical content of scientific papers by replacing complex words and phrases with simpler alternatives. The unsupervised machine learning method achieved multi-gram lexical simplification, demonstrating its effectiveness in making academic texts more accessible.
The significance of this study is three-fold. First, it addresses the gap in applying NLP techniques to solve real-world problems for a specific target group. Empirical evidence is provided on the extent to which academic texts can be simplified to improve comprehension for non-native English speakers, along with the role of bi-directional translation in reducing readability challenges. Second, a specialized dataset of 1,876 ACL papers was created, and a complex word identification approach tailored to academic texts was designed. This contribution directly tackles the unique challenges of simplifying scholarly content. Third, an unsupervised machine learning method was developed to simplify complex words and phrases in scientific papers, enhancing accessibility for readers with limited English proficiency. This approach is both scalable and effective for multi-gram simplification. Overall, this work advances the field of NLP-driven text simplification by introducing innovative solutions tailored to academic contexts, ultimately contributing to improved educational outcomes for non-native English speakers.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material