# Markov Decision Processes

Automated planning is a branch of Artificial Intelligence (AI) aiming at finding
optimal plans to achieve goals. Planning problems with non-deterministic actions
are known to be much harder to solve. *Markov Decision Processes* (MDPs) are
generally used to solve such planning problems that can be represented using a
probabilistic model of applicable actions. Some classical algorithms such as
Value Iteration (VI) and Policy Iteration (PI) can be used to find an optimal
policy of an MDP. Unfortunately, due to the curse of dimensionality, most MDP
planning domains have too many states to be able to be solved in a reasonable
time using these algorithms.

Several state-of-the-art MDP algorithms have been proposed to increase the speed of computation. Many of them are able to focus on the most promising parts of the MDP through heuristic search algorithms such as LRTDP or LAO*. Some other MDP algorithms use partitioning methods to decompose the state-space in smaller parts. For example, the P3VI (Partitioned, Prioritized, Parallel Value Iteration) and the TVI (Topological Value Iteration) algorithms decompose the state-space and then solve each part separately.

An orthogonal way to improve computation speed is to consider *how* these
algorithms are actually implemented. In this project, we propose to take account
of low-level elements such as the memory hierarchy and the different levels of
parallelism of modern computers in the context of MDP planning algorithms. We
propose a new memory representation of MDP, which we call CSR-MDP, that aims at
improving the cache memory access patterns (for whatever planning algorithm that
is used). We also propose a parallel version of the TVI algorithm, pcTVI (for
parallel-chained TVI).

In this project, we developed an SSP-MDP planning C++ Library that supports classical algorithms (e.g., VI, LRTDP, TVI, etc.) as well as our proposed new planning algorithms (pcTVI, eTVI, eiTVI) with our new MDP memory representation, CSR-MDP.