Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Graph Representation Learning for Structured Data and Genomic Analysis

Loading...
Thumbnail Image

Date

Authors

Xue, Hansheng

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Graphs serve as a universal language, capable of modeling all interactions across the real world, spanning diverse domains such as recommendation systems and genomics. Analyzing graph-structured data helps reveal concealed patterns or insights within the graph information. In this thesis, our primary focus revolves around the utilization of deep learning techniques to develop graph representation learning models for modeling graph-structured data in e-commerce and genomics. In the realm of e-commerce, we have devised graph representation learning methodologies to model two prevalent graph structures, the multiplex bipartite graph and the dynamic heterogeneous graph, to enhance the performance of recommendation systems. In the field of genomics, our focus is directed towards addressing two typical challenges, metagenomics binning and haplotype phasing, and have developed two graph neural networks equipped with constraint satisfaction models. A multiplex bipartite graph comprises nodes from two distinct domains, where interactions are limited to inter-domain actions. Effectively modeling multiplex bipartite graphs entails addressing two key challenges: a) managing disparate node attributes within bipartite structures, and b) handling the presence of multiple edge types connecting the two distinct domains. We present DualHGCN, a graph neural network model specially designed to transform multiplex bipartite networks into two sets of hypergraphs. This transformation empowers DualHGCN to encode node representations using spectral hypergraph convolutional operators. DualHGCN also incorporates intra- and inter-message passing strategies to facilitate message exchange across various node and edge types. Dynamic heterogeneous graphs are typically depicted as a series of static graph snapshots, where each snapshot is inherently a heterogeneous graph. Effectively representing dynamic heterogeneous graphs entails not only capturing the structural information within individual static snapshots but also learning the evolutionary patterns between consecutive snapshots. To tackle this challenge, we present DyHATR, a method that utilizes hierarchical attention mechanisms to capture heterogeneous information within each snapshot and integrates recurrent neural networks with temporal attention to capture the evolutionary patterns between consecutive snapshots. In metagenomic contig binning, numerous existing tools often overlook the valuable information within the assembly graph. Instead, they primarily depend on the composition and coverage attributes of contigs for the binning process. We introduce RepBin, a binning tool that utilizes a graph neural network to capture the structure within the assembly graph, all while adhering to the heterophilous constraints derived from single-copy marker genes. RepBin further employs graph convolutional networks to label unknown contigs, initially utilizing constrained contigs for obtaining these labels. Reference-based polyploid haplotype phasing strives to categorize reads within a SNP matrix into clusters, each corresponding to distinct haplotypes. These methods frequently employ a minimum error correction (MEC) score to evaluate the disparities between the consensus haplotypes and the affiliated reads within each cluster. Optimizing the MEC score is a computationally challenging NP-Hard problem. We introduce NeurHap, an algorithm that frames the haplotype phasing problem as a graph coloring problem, with colors denoting haplotypes. NeurHap comprises two components: NeurHap-search, a graph neural network for learning vertex representations and color assignments, followed by NeurHap-refine, a local refinement strategy for color adjustment and MEC score optimization.

Description

Keywords

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads

File
Description
abcd