Structured Sequence Modeling with Graphs, Flows, and Real-world Knowledge
Abstract
The arrow of time gives rise to an important class of structures: sequences of events. In natural languages, we communicate with each other by linearizing our thoughts and turning them into a sequence of sound waves. In finance, investors make trade decisions based on the sequence of stock prices and of different economic factors, and how these sequences are related to each other. In engineering, we build better aircraft wings by studying the sequence of solutions to some partial differential equations (PDEs). This thesis presents a set of learning techniques to model sequence data more effectively across three application domains: image captioning, time series modeling, and PDE simulation.
In the image captioning domain, we propose Transform and Tell, a transformer-based architecture that can generate linguistically rich captions with uncommon entity names by attending to relevant textual and visual contexts in a news article. On the GoodNews dataset, Transform and Tell outperforms the state of the art by a factor of four on the CIDEr score. We also curate NYTimes800k, the largest news image captioning dataset at the date of publication, containing 445K articles with 793K image-caption pairs from the New York Times.
In the time series modeling domain, we introduce Radflow, a novel architecture that can capture the flow of influence in a network of time series and perform imputation and prediction tasks. Radflow can decompose its predictions into interpretable layers and is scalable to networks with hundreds of thousands of nodes. On the newly-curated WikiTraffic dataset, the largest dynamic network of time series of traffic data for the English Wikipedia site, Rad- flow outperforms both traditional time series models such as AR and N-BEATS, as well as previous spatiotemporal models such as ARNet and T-GCN.
In the PDE simulation domain, we tackle the problem of solving fluid mechanics and structural mechanics problems. We propose the Factorized Fourier Neural Operator (F-FNO), an improved neural operator that uses the factorization of the Fourier transform and better residual connections to enable effective scaling to deep networks. On a wide range of challenging PDEs on regular grids, structured meshes, and point clouds, the F-FNO outperforms the state of the art by significant margins, reducing the error by up to 83% on the Navier-Stokes equations.
Description
Keywords
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material