Learning from Noisy Labels using Robust Structures

Han, Yan

Learning from Noisy Labels using Robust Structures

Date

2025

Authors

Han, Yan

Abstract

Modern machine learning relies on complex models like deep neural networks, which require large amounts of well-annotated data. Crowdsourcing is a cost-effective way to gather labels, but it often fails to provide sufficient high-quality data. This raises a key question: How can we design robust structures to improve learning from noisy labels? Without such mechanisms, noisy labels from crowdsourced workers degrade model performance, especially for CNNs, which tend to overfit in the presence of mislabeled data. To address this, we use the Co-Training framework, where two parallel models learn from each other and correct each other's mistakes. The challenge lies in enhancing this ability. Therefore, in this thesis, we aim to develop a series of robust machine learning approaches, so that they can perfectly handle the difficulty from noisy supervision. Our works are summarized as follows: Chapter 3: Recognizing that feature similarity in collaborative training setups can lead to suboptimal learning outcomes, we developed the Discriminate Co-Training model. This approach adapts the collaborative training framework by implementing discrepancy constraints between the feature extractors of two networks. These constraints allow each network to learn distinct, discriminative features, effectively pacifying the mimicry tendency often seen in homogeneous networks. Without these constraints, networks in co-training setups frequently replicate each other's features, which diminishes the model's resilience to noise. By addressing this mimicry problem, our model leverages the diversity of features to improve overall robustness and accuracy, particularly in noisy environments. Chapter 4: Building on the insights from the previous chapter, we utilize Sinkhorn loss to enhance diversity further among models in a co-training setup. Traditional metrics like maximum mean discrepancy and Wasserstein distance are powerful but limited in their scope. Sinkhorn synthesizes these metrics, encapsulating their advantages while offering a more unified and efficient approach to measuring and encouraging diversity between models. This chapter explains the mathematical formulation of Sinkhorn and illustrates its benefits in promoting diversity between deep learning networks. This approach fosters resilience against noise by preventing convergence on similar features, thereby boosting model accuracy and reliability in noisy data scenarios. Chapter 5: To explore the potential of transformers in handling noisy datasets, we propose a novel Contrastive Co-Transformer model. Transformers have shown significant promise due to their capacity to learn complex dependencies, making them highly adaptable in situations with noisy labels. Our model leverages a combination of contrastive loss and classification loss to utilize all data samples, whether clean or noisy, without requiring pre-filtering or assumptions about data quality. This chapter discusses how transformers trained under this framework can harness the entirety of the dataset, achieving strong performance even when a substantial portion of the labels are noisy. We detail the architecture and training dynamics, demonstrating how transformers' inherent robustness can be amplified with this approach. Chapter 6: Finally, to advance the robustness of learned representations in noisy datasets, we apply Sinkhorn in the context of deep embedding learning. Traditional embedding methods often falter in noisy environments, leading to unreliable representations that weaken model performance. By integrating Sinkhorn, our framework learns optimal embeddings that remain robust even with label noise. This chapter explains the practical applications and advantages of embedding learning using Sinkhorn, emphasizing its simplicity and effectiveness in maintaining model accuracy and feature diversity.