1665-6423

S1665-64232011000200002

México

España

México

00 08 2011

9 2 129 144

Mixed Acceleration Techniques for Solving Quickly Stochastic Shortest–Path Markov Decision Processes

M. de G. García–Hernández*¹, J. Ruiz–Pinales¹, E. Onaindía², S. Ledesma–Orozco¹, J. G. Aviña–Cervantes¹, E. Alvarado–Méndez¹, A. Reyes–Ballesteros³

¹ University of Guanajuato, Comunidad de Palo Blanco s/n, C.P. 36885, Salamanca, Guanajuato, México. (garciag@ugto.mx, pinales@ugto.mx, selo@ugto.mx, avina@ugto.mx, ealvarad@ugto.mx)

² Universitat Politécnica de Valéncia, DSIC, Camino de Vera s/n, 46022, Valencia, España, onaindia@dsic.upv.es

³ Electrical Research Institute, Reforma 113, C.P. 62490, Temixco, Cuernavaca, Morelos, México, areyes@iie.org.mx

ABSTRACT

In this paper we propose the combination of accelerated variants of value iteration mixed with improved prioritized sweeping for the fast solution of stochastic shortest–path Markov decision processes. Value iteration is a classical algorithm for solving Markov decision processes, but this algorithm and its variants are quite slow for solving considerably large problems. In order to improve the solution time, acceleration techniques such as asynchronous updates, prioritization and prioritized sweeping have been explored in this paper. A topological reordering algorithm was also compared with static reordering. Experimental results obtained on finite state and action–space stochastic shortest–path problems show that our approach achieves a considerable reduction in the solution time with respect to the tested variants of value iteration. For instance, the experiments showed in one test a reduction of 5.7 times with respect to value iteration with asynchronous updates.

]]> Keywords: Markov decision processes, acceleration techniques, prioritization.

RESUMEN

En este documento proponemos la combinación de variantes aceleradas del algoritmo de iteración de valor combinadas con el algoritmo de barrido priorizado mejorado para la rápida solución de los procesos de decisión de Markov de ruta estocástica más corta. Iteración de valor es un algoritmo clásico para resolver a los procesos de decisión de Markov, pero este algoritmo y sus variantes son lentos para resolver problemas considerablemente grandes. Con el objeto de mejorar el tiempo de solución de este algoritmo, en este documento se han explorado técnicas de aceleración tales como actualizaciones asíncronas, priorización y barrido priorizado. Un algoritmo de reordenamiento topológico también fue comparado con uno de reordenamiento estático. Los resultados experimentales obtenidos en un problema de ruta estocástica más corta con espacios de estados–acciones finitos; muestran que nuestro enfoque logra una considerable reducción en el tiempo de solución con respecto a las variantes de iteración de valor probadas. Por ejemplo, los experimentos mostraron en una prueba una reducción de 5.7 veces con respecto a iteración de valor usando actualizaciones asíncronas.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

[1] Boutilier, C., Dean, T. and Hanks, S., Decision–theoretic planning: structural assumptions and computational leverage, Journal of Artificial Intelligence Research, 11, 1999, pp 1–94. [ Links ]

]]>

[2] Bellman, R. E., The theory of dynamic programming, Bull. Amer. Math. Soc., 60, 1954, pp 503–516. [ Links ]

[3] Puterman, M. L., Markov Decision Processes, Wiley Editors, New York, USA, 1994. [ Links ]

[4] Kuter, U., Hu, J., Computing and Using Lower and Upper Bounds for Action Elimination in MDP Planning, Proceedings of the Symposium on Abstraction, Reformulation and Approximation, SARA, 2007. [ Links ]

[5] Dean, T., Kaelbling, L. P., Kirman, J. and Nicholson, A., Planning under Time Constraints in Stochastic Domains, Artificial Intelligence, 76 (1–2), July 1995, pp 35–74. [ Links ]

[6] Boutilier, C., Dearden, R. and Goldszmidt, M., Stochastic Dynamic Programming with Factored Representations, Artificial Intelligence, 121 (1–2), 2000, pp 49–107. [ Links ]

]]>

[7] Givan, R., Dean T. and Greig, M., Equivalence Notions and Model Minimization in MDPs, Artificial Intelligence, 147 (1–2), 2003, pp 163–233. [ Links ]

[8] Tsitsiklis, J. N. and Van Roy, B., Feature–based methods for large–scale dynamic programming, Machine Learning, 22, 1996, pp 59–94. [ Links ]

[9] De Farias, D. P. and Van Roy, B., The linear programming approach to approximate dynamic programming, Operations Research, 51 (6), 2003, pp 850–865. [ Links ]

[10] Bonet, B. and Geffner, H., Labeled RTDP: Improving the Convergence of Real–Time Dynamic Programming, International Conference on Automated Planning and Scheduling, ICAPS, 2003, pp 12–21, Trento, Italy. [ Links ]

[11] Hansen, E. A. and Zilberstein, S., LAO: A Heuristic Search Algorithm that finds solutions with Loops, Artificial Intelligence, 129, 2001, pp 35–62. [ Links ]

]]>

[12] Chang, H. S., Fu, M. C., Hu, J. and Marcus, S. I., An Adaptive sampling algorithm for solving MDPs, Operations Research, 53 (1), 2005, pp 126–139. [ Links ]

[13] Gardiol, N. and Kaelbling, L. P., Envelope–based Planning in Relational MDP's, Neural Information Processing Systems NIPS, 16, 2003, Vancouver, B. C. [ Links ]

[14] Gardiol, N., Relational Envelope–based Planning, PhD Thesis, MIT, MA, USA, February 2008. [ Links ]

[15] McMahan, H. B. and Gordon, G., Fast Exact Planning in Markov Decision Processes, 15th International Conference on Automated Planning and Scheduling (Monterey, CA, USA, 2005a). [ Links ]

[16] Dai, P. and Goldsmith, J., Topological Value Iteration Algorithm for Markov Decision Processes, 20^thInternational Joint Conference on Artificial Intelligence, IJCAI, 2007, pp 1860–1865, Hyderabad, India [ Links ]

[17] Dibangoye, J. S., Chaib–draa, B., Mouaddib, A., A Novel Prioritization Technique for Solving Markov Decision Processes, 21^st International FLAIRS Conference, Association for the Advancement of Artificial Intelligence, Florida, USA, 2008. [ Links ]

[18] Puterman, M. L., Markov Decision Processes, Wiley Interscience Editors, New York, USA, 2005. [ Links ]

[19] Russell, S., Artificial Intelligence: A Modern Approach, 2n^d Edition, Making Complex Decisions (Ch–17), Pearson Prentice Hill Ed., USA, 2004. [ Links ]

[20] Chang, I. and Soo, H., Simulation–based algorithms for Markov decision processes, Communications and Control Engineering, Springer Verlag London Limited, 2007. [ Links ]

[21] Agrawal, S. and Roth, D., Learning a Sparse Representation for Object Detection, Proc. 7^th European Conference on Computer Vision (Copenhagen, Denmark, 2002), pp. 1–15. [ Links ]

[22] Kirk, W. A., Khamsi, M. A., An Introduction to Metric Spaces and Fixed Point Theory, John Wiley, New York, USA, 2001. [ Links ]

[23] Tijms, H. C., A First Course in Stochastic Models, Wiley Ed., Discrete–Time Markov Decision Processes (Ch–6), UK, 2003. [ Links ]

[24] Littman, M. L., Dean, T. L. and Kaelbling, L. P., On the Complexity of Solving Markov Decision Problems, 11^th International Conference on Uncertainty in Artificial Intelligence, 1995, pp 394–402, Montreal, Quebec. [ Links ]

[25] Wingate, D. and Seppi, K. D., Prioritization Methods for Accelerating MDP Solvers, Journal of Machine Learning Research, 6, 2005, pp 851–881. [ Links ]

[26] Li, L., A Unifying Framework for Computational Reinforcement Learning Theory, PhD Thesis, The State University of New Jersey (New Brunswick, NJ, USA, October 2009). [ Links ]

[27] Vanderbei, R. J., Optimal Sailing Strategies, Statistics and Operations Research Program, University of Princeton, USA, (http://orfe.princeton.edu/~rvdb/sail/sail.html), 1996. [ Links ]

[28] Reyes, A., Ibarguengoytia, P., Sucar, L. E. and Morales, E., Abstraction and Refinement for Solving Continuous Markov Decision Processes, 3^rd European Workshop on Probabilistic Graphical Models, 2006, pp 263–270, Prague, Czech Republic. [ Links ]

[29] Hinderer, K. and Waldmann, K. H., The critical discount factor for Finite Markovian Decision Processes with an absorbing set, Mathematical Methods of Operations Research, Springer Verlag, 57, 2003, pp 1–19. [ Links ]

[30] Garey, M. R. and Johnson, D. S., Computers and Intractability, A Guide to the Theory of NP–Completeness, Appendix A: List of NP–Complete Problems, W. H. Freeman, 1990. [ Links ]

[31] Reyes, A., Sucar, L. E., Ibargüengoytia, P., Power Plant Operator Assistant, Bayesian Modeling Applications Workshop in the 19th Conference on Uncertainty in Artificial Intelligence UAI–2003, August 2003. [ Links ]

]]>

1999 11

1-94

1954 60

503-516

1994

2007

1995 76 1-2 1-2

35-74

2000 121 1-2 1-2

49-107

2003 1471-2

163-233

1996 22

59-94

2003 51 6 6

850-865

2003

12-21

2001 129

35-62

2005 53 1 1

126-139

2005

2007

1860-1865

2008

Florida Florida

2005

2004 2

2007

2002

Copenhagen

1-15

2001

2003

1995

394-402

2005 6

851-881

1996

2006

263-270

2003 57

1-19

1990

2003