Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. The thesis focuses on two possible approaches that can be used to solve larger problems: approximation methods and exploitation of additional problem structure.
First, a number of new efficient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex value functions with a constant number of linear vectors, (2) a grid-based point interpolation method that supports variable grids, (3) an incremental version of the linear vector method that updates value function derivatives, as well as (4) various heuristics for selecting grid-points. The new and existing methods are experimentally tested and compared on a set of three infinite discounted horizon problems of different complexity. The experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test.
Second, the thesis presents a number of techniques for exploiting additional structure in the model of complex control problems. These are studied as applied to a medical therapy planning problem - the management of patients with chronic ischemic heart disease. The new extensions proposed include factored and hierarchically structured models that combine the advantages of the POMDP and MDP frameworks and cut down the size and complexity of the information state space.
M. Hauskrecht. Incremental methods for computing bounds in partially observable Markov decision processes. In Proceedings of AAAI-97, pp. 734-739, 1997. (with two small fixes in the introduction)
M. Hauskrecht. Approximation methods for solving control problems in partially observable Markov decision processes. Technical memo, MIT-LCS-TM-565, 1997.
M. Hauskrecht. Dynamic decision making in stochastic partially observable medical domains: ischemic heart disease example. In Proceedings of the AIME-97, Grenoble, France, pp.296-299, 1997. (with minor modifications)
M. Hauskrecht. Planning and control in stochastic domains with imperfect information. MIT EECS PhD thesis proposal, August 1996.
M. Hauskrecht. Dynamic decision making in stochastic partially observable medical domains. In Proceedings of AAAI symposium on AI in Medicine, Stanford University, pp. 69-72, 1996.
POMDP problems:
Supervisor: Peter Szolovits
Readers: