Decision Making with Unknown Future Costs

Chen, Yitian2026-02-042026-02-04https://hdl.handle.net/1885/733805235This thesis develops a unified framework for decision-making problems with unknown future costs, providing both theoretical guarantees and empirical evaluations of its performance. We begin by studying the online Linear Quadratic (LQ) optimal control problem for the cases where (i) future costs are unknown beyond a certain preview horizon and sequentially revealed over time; and (ii) costs are unknown and must be inferred from observed optimal trajectory data. We then extend the framework to dynamic LQ games with sequentially revealed (and potentially previewed) costs. In all settings, the proposed framework is based on predicting and tracking a candidate optimal trajectory using the available costs. We begin by applying the proposed framework to the online LQ control problem with sequentially revealed cost. We adopt the notion of regret as the decision quality measurement. We show that the regret of the proposed method is upper bounded by terms that decay exponentially fast as the preview horizon of future costs increases. Simulations verify this exponential decay and demonstrate that our controller outperforms state-of-the-art methods that do not leverage cost feedback. We then consider the case where the costs must be inferred from observed optimal trajectory data. This is a new framework for solving the learning from demonstration problem. We establish a theoretical connection between the regret and the estimation error of the estimated optimal control gain. A regret bound is derived under an Extended Kalman Filter(EKF)-based parameter estimation scheme, and its performance is validated through numerical experiments. We then apply this framework to a new dynamic LQ game problem, where the costs are sequentially revealed to the players (and may be previewed). We introduce the notion of \emph{price of uncertainty} (PoU) that generalises the notion of regret to multi-agent settings. We establish bounds on the PoU incurred when all players are adopting the designed controller using our framework. Simulation results validate the theoretical bounds on PoU.en-AUDecision Making with Unknown Future Costs202610.25911/3G84-S219