Answering Shortest Path Distance Queries in Large Complex Networks

Date

2022

Authors

Farhan, Muhammad

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The \emph{distance query} problem is to find the shortest-path distance between an arbitrary pair of vertices in a graph. It is considered as a fundamental problem in graph theory. Despite a tremendous amount of research on the subject, there is still no satisfactory solution that can scale to large complex networks which may have billions of vertices and edges. Furthermore, many real-world complex networks such as social networks and web graphs are typically dynamic, undergoing discrete changes such as edge insertion and deletion in their topological structure over time. Thus, there is also a pressing need to address the distance query problem on dynamic networks. The goal of this thesis is to address the distance query problem on large static and dynamic complex networks. Labelling-based methods are well-known for rendering fast response time to distance queries; however, existing labelling-based methods can only construct distance labelling for moderately large graphs with millions of vertices and edges and cannot scale to large graphs with billions of vertices and edges due to their prohibitively large space requirements and unbearably long pre-processing time. This thesis proposes a scalable approach that enables fast construction of a distance labelling of a limited size, which contains only distance information from all vertices in a graph to some ``important" vertices (not all) - called \emph{landmarks}. Such a distance labelling is considered as a \emph{partial distance labelling}, in contrast to a \emph{full distance labelling} that contains distance information for all pairs of vertices in a graph. Then, we combine a partial distance labelling that can be computed in an offline manner with online searching to leverage the advantages from both sides - accelerating query processing through a small sized partial distance labelling that provides a good approximation to bound online searches. The proposed method can efficiently construct a distance labelling for a graph with billions of vertices and edges, and enable fast distance computation, e.g. in the order of milliseconds. Since graphs in real-world are dynamic that undergo changes such as edge insertion or deletion in their topological structure, existing labelling-based methods still greatly suffer from the drawback of scalability on dynamic graphs and they can hardly update a distance labelling efficiently. In this thesis, we propose a fully dynamic method which can efficiently reflect graph changes (i.e., single edge insertions or deletions) by dynamically maintaining a distance labelling in order to answer distance queries on dynamice graphs. At its core, our proposed method incorporates two building blocks: (i) \emph{incremental algorithm} for handling incremental update operations, i.e. edge insertion, and (ii) \emph{decremental algorithm} for handling decremental update operations, i.e. edge deletion. Moreover, this thesis also introduces a batch-dynamic method which can process batch of updates (i.e., batches of edge insertions and deletions) efficiently to further improve the performance of answering distance queries on graphs that undergo rapid changes in their topological structure. The proposed batch-dynamic method enables us to unify edge insertions and deletion, helps us to avoid unnecessary and repeated computations, and allows us to exploit the potential of parallelism which as a result is much more efficient than processing graph changes separately one by one. In this thesis, we have conducted extensive experiments on 15-17 real-world networks from a variety of application domains to test the scalability, efficiency, and robustness of the proposed static and dynamic methods against existing state-of-the-art static and dynamic methods.

Description

Keywords

Citation

Source

Type

Thesis (PhD)

Book Title

Entity type

Access Statement

License Rights

Restricted until