Effective and Scalable Representation Learning for Graphs: From Theory to Applications
Date
Authors
Zhu, Hao
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Graphs are ubiquitous in the real world, representing a wide range of relational data structures across numerous domains. However, developing effective and scalable machine learning models that can capture the intrinsic properties and inductive biases of graph-structured data remains a significant challenge. Traditional representation learning algorithms for graphs face several limitations, including scalability issues for large-scale graphs, oversmoothing in deep architectures, overreliance on labeled data, and limited applicability to tasks beyond graph domains. This thesis presents a comprehensive body of work that tackles the challenges of representation learning for graphs, focusing on oversmoothing, effectiveness, and scalability. In the first part, Simple Spectral Graph Convolution (S2GC), a variant of Graph Convolutional Networks (GCNs) derived from a modified Markov Diffusion Kernel, is proposed. S2GC demonstrates superior computational complexity and competitive performance compared to classic Graph Neural Network (GNN) architectures across various tasks. Furthermore, I introduce random RangE FInder-based Network Embedding (REFINE), a highly efficient graph embedding approach that combines randomized range finders and S2GC, enabling scalable matrix factorization for large-scale graphs. In the second part, the classic Laplacian Eigenmaps (LE) framework is extended to incorporate contrastive learning principles, resulting in Contrastive Laplacian Eigenmaps (COLES). COLES offers an efficient and scalable closed-form solution for unsupervised graph representation learning on large-scale graphs. Building upon COLES, Generalized Laplacian Eigenmaps (GLEN), a principled framework, is proposed and can learn graph representations by maximizing the rank difference between the total scatter matrix and the within-class scatter matrix, ensuring optimal class separation. These GNN-based architectures achieve unprecedented performance improvements over traditional methods in various tasks, including node classification and clustering. In the third part, I demonstrate the versatility of graph representation learning by presenting applications across diverse domains. Leveraging COLES, I develop unsupErvised discriminAnt Subspace lEarning (EASE), a similarity matrix with a block-diagonal prior that improves transductive few-shot learning performance by learning a discriminative subspace tailored to the given support and query data. Additionally, Prototype-based Label Propagation (ProtoLP), a novel approach, is proposed and able to dynamically construct graphs based on features, enabling fast transductive inference close to the speed of inductive methods while capturing complex interactions between prototypes and query samples. The research presented in this thesis addresses the fundamental challenges of effective and scalable graph representation learning, providing theoretical insights, novel algorithms, and practical applications that harness the power of graph-structured data across numerous domains.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
DOI
Restricted until
Downloads
File
Description