1665-6423

S1665-64232009000300008

México

España

00 12 2009

7 3 354 373

Acceleration of association–rule based markov decision processes

Ma. de G. García–Hernández^*1, J. Ruiz–Pinales², A. Reyes–Ballesteros³, E. Onaindía⁴, J. Gabriel Aviña–Cervantes⁵, S. Ledesma⁶

^1,2,5,6 Universidad de Guanajuato, Comunidad de Palo Blanco s/n, C.P. 36885, Salamanca, Guanajuato, México, garciag@salamanca.ugto.mx, pinales@salamanca.ugto.mx, avina@salamanca.ugto.mx, selo@salamanca.ugto.mx.

³ Instituto de Investigaciones Eléctricas, Reforma 113, C.P. 62490, Temixco, Morelos, México, areyes@iie.org.mx

⁴ Universidad Politécnica de Valencia, DSIC, Camino de Vera s/n, 46022, Valencia, España, onaindia@dsic.upv.es

ABSTRACT

]]> In this paper, we present a new approach for the estimation of Markov decision processes based on efficient association rule mining techniques such as Apriori. For the fastest solution of the resulting association–rule based Markov decision process, several accelerating procedures such as asynchronous updates and prioritization using a static ordering have been applied. A new criterion for state reordering in decreasing order of maximum reward is also compared with a modified topological reordering algorithm. Experimental results obtained on a finite state and action–space stochastic shortest path problem demonstrate the feasibility of the new approach.

Keywords: Markov decision processes, association rules, acceleration procedures.

RESUMEN

En este documento se presenta un nuevo enfoque para la estimación de procesos de decisión de Markov basado en técnicas eficientes de minería de reglas de asociación tal como Apriori. Para la más rápida solución del resultante proceso de decisión de Markov basado en reglas de asociación, han sido aplicados varios procedimientos de aceleración tales como actualización asíncrona y priorización usando reordenamiento estático. Un nuevo criterio para el reordenamiento de estados es también comparado con un algoritmo modificado de reordenamiento topológico. Los resultados experimentales obtenidos en un problema estocástico de ruta más corta, con un número finito de acciones y estados, demuestran la viabilidad del nuevo enfoque.

Palabras clave: Procesos de decisión de Markov, reglas de asociación, procesos de aceleración.

DESCARGAR ARTÍCULO EN FORMATO PDF

References

]]>

[1] Boutilier, C., Dean, T., Hanks, S., Decision–theoretic planning: structural assumptions and computational leverage, Journal of Artificial Intelligence Research, 11, 1999, pp 1–94. [ Links ]

[2] Bellman, R. E., The theory of dynamic programming, Bull. Amer. Math. Soc., 60, 1954, pp 503–516. [ Links ]

[3] Puterman, M. L., Markov Decision Processes, Wiley Editors, New York, USA, 1994. [ Links ]

[4] Bonet, B., Geffner, H., Learning depth–first search: A unified approach to heuristic search in deterministic and non–deterministic settings and its application to MDP, International Conference on Automated Planning and Scheduling, ICAPS, 2006, Cumbria, UK. [ Links ]

[5] Darwiche, A., Goldszmidt M., Action networks: A framework for reasoning about actions and change under understanding, 10th Conference on Uncertainty in Artificial Intelligence, UAI, 1994, pp 136–144, Seattle, Washington, USA. [ Links ]

]]>

[6] Van Otterlo, M., A Survey of Reinforcement Learning in Relational Domains, Technical Report Series CTIT–05–31, ISSN 1381–3625, July 2005. [ Links ]

[7] Dean, T., Kaelbling, L. P., Kirman, J., Nicholson, A., Planning under Time Constraints in Stochastic Domains, Artificial Intelligence, 76 (1–2), July 1995, pp 35–74. [ Links ]

[8] Boutilier, C., Dearden, R., Goldszmidt, M., Stochastic Dynamic Programming with Factored Representations, Artificial Intelligence, 121 (1–2), 2000, pp 49–107. [ Links ]

[9] Givan, R., Dean, T., Greig, M., Equivalence Notions and Model Minimization in MDPs, Artificial Intelligence, 147 (1–2), 2003, pp 163–233. [ Links ]

[10] Tsitsiklis, J. N., Van Roy, B., Feature–based methods for large–scale dynamic programming, Machine Learning, 22, 1996, pp 59–94. [ Links ]

]]>

[11] De Farias, D. P., Van Roy, B., The linear programming approach to approximate dynamic programming, Operations Research, 51 (6), 2003, pp850–865. [ Links ]

[12] Bonet, B., Geffner, H., Labeled RTDP: Improving the Convergence of Real–Time Dynamic Programming, International Conference on Automated Planning and Scheduling, ICAPS, 2003, pp 12–21, Trento, Italy. [ Links ]

[13] Hansen, E. A., Zilberstein, S., LAO: A Heuristic Search Algorithm that finds solutions with Loops, Artificial Intelligence, 129, 2001, pp 35–62. [ Links ]

[14] Chang, H. S., Fu, M. C., Hu, J., Marcus, S. I., An Adaptive sampling algorithm for solving MDPs, Operations Research, 53 (1), 2005, pp 126–139. [ Links ]

[15] Gardiol, N., Kaelbling, L. P., Envelope–based Planning in Relational MDP's, Neural Information Processing Systems NIPS, 16, 2003, Vancouver, B. C. [ Links ]

]]>

[16] Gardiol, N., Relational Envelope–based Planning, PhD Thesis, MIT, MA, USA, February 2008. [ Links ]

[17] Bellman, R. E., Dynamic Programming, Princeton United Press, Princeton, USA, 1957. [ Links ]

[18] Puterman, M. L., Markov Decision Processes, Wiley Interscience Editors, New York, USA, 2005. [ Links ]

[19] Russell, S., Artificial Intelligence: A Modern Approach, 2nd Edition, Making Complex Decisions (C–17), Pearson Prentice Hill Ed., USA, 2004. [ Links ]

[20] Chang, I. and Soo, H., Simulation–based algorithms for Markov decision processes, Communications and Control Engineering, Springer Verlag London Limited, 2007. [ Links ]

]]>

[21] Tijms, H. C., A First Course in Stochastic Models, Wiley Ed., Discrete–Time Markov Decision Processes (C–6), UK, 2003. [ Links ]

[22] Littman, M. L., Dean, T. L. and Kaelbling, L. P., On the Complexity of Solving Markov Decision Problems, 11th International Conference on Uncertainty in Artificial Intelligence, 1995, pp 394–402, Montreal, Quebec. [ Links ]

[23] Wingate, D., Seppi, K. D., Prioritization Methods for Accelerating MDP Solvers, Journal of Machine Learning Research, 6, 2005, pp 851–881. [ Links ]

[24] Dai, P., Hansen, E. A., Prioritizing Bellman Backups Without a Priority Queue, Association for the Advancement of Artificial Intelligence, 17th International Conference on Automated Planning and Scheduling, ICAPS, 2007. [ Links ]

[25] Agrawal, R., Imielinski, T., Swami, A., Mining Association Rules between Sets of Items in Large Databases, ACM SIGMOD International Conference on Management of Data, May 1993, Washington DC, USA. [ Links ]

]]>

[26] Hahsler, M., Hornik, K., Reutterer, T., Implications of Probabilistic Data Modeling for Mining Association Rules, Studies in Classification Data Analysis and Knowledge Organization, Springer Verlag, 2005. [ Links ]

[27] Brijs, T., Swinnen, G., Van Hoof, K., Wets, G., Building an association rules framework to improve product assortment decisions, Data Mining and Knowledge Discovery, 8 (1), 2004, pp 7–23. [ Links ]

[28] Lawrence, R. D., Almasi, G. S., Kotlyar, V., Viveros, M. S., Duri, S., Personalization of supermarket product recommendations, Data Mining and Knowledge Discovery, 5 (1/2), 2001, pp 11–32. [ Links ]

[29] Van den Poel, D., Schamphelaere, J., Wets, G., Direct and indirect effects of retail promotions on sales and profits in the do–it–yourself market, Expert Systems with Applications, 27 (1), 2004, pp 53–62. [ Links ]

[30] Agrawal, R., Srikant, R., Fast Algorithms for Mining Association Rules, 20th VLDB Conference, IBM Almaden Research Center, 1994. [ Links ]

]]>

[31] Sutton, R. S., Barto, A. G., Introduction to Reinforcement Learning, MIT Press, USA 1998. [ Links ]

[32] Scherrer, B., Mannor, S., Error Reducing Sampling in Reinforcement Learning, Institut National de Recherche en Informatique et Automatique, INRIA, 98352, Vol.1, September 2006. [ Links ]

[33] Gupta, G. K., Introduction to Data Mining with Case Studies, Prentice–Hall of India, Pvt. Ltd, 2006, pp 76–82. [ Links ]

[34] Ceglar, A., Roddick, J. F., Association Mining, ACM Computing Surveys, Vol. 38, No.2, Article 5, July 2006. [ Links ]

[35] Vanderbei, Robert J., Optimal Sailing Strategies, Statistics and Operations Research Program, University of Princeton, USA, (http://orfe.princeton.edu/~rvdb/sail/sail.html), 1996. [ Links ]

]]>

[36] Blackwell, D., Discounted dynamic programming, Annals of Mathematical Statistics, Vol. 36, 1965, pp 226–235. [ Links ]

[37] Hinderer, K., Waldmann, K. H., The critical discount factor for Finite Markovian Decision Processes with an absorbing set, Mathematical Methods of Operations Research, Springer Verlag, 57, 2003, pp 1–19. [ Links ]

[38] Garey, M. R., Johnson, D. S., Computers and Intractability, A Guide to the Theory of NP–Completeness, Appendix A: List of NP–Complete Problems, W. H. Freeman, 1990. [ Links ]

[39] Dai, P., Goldsmith, J., Topological Value Iteration Algorithm for Markov Decision Processes, 20th International Joint Conference on Artificial Intelligence, IJCAI, 2007, pp 1860–1865, Hyderabad, India. [ Links ]

[40] Reyes, A., Ibarguengoytia, P., Sucar, L. E., Morales, E., Abstraction and Refinement for Solving Continuous Markov Decision Processes, 3rd European Workshop on Probabilistic Graphical Models, 2006, pp 263–270, Prague, Czech Republic. [ Links ]

]]>

[41] Vanderbei, Robert J., Linear Programming: Foundations and Extensions, Springer Verlag, 3rd Edition, January 2008. [ Links ]

]]>

1999 11

1-94

1954 60

503-516

1994

2006

1994

136-144

2005

July 1 99 76 1-2 1-2

35-74

2000 121 1-2 1-2

49-107

2003 147 1-2 1-2

163-233

1996 22

59-94

2003 51 6 6

850-865

2003

12-21

2001 129

35-62

2005 53 1 1

126-139

2003 16

Febr ua ry

1957

2005

2004 2nd

2007

2003

1995

394-402

2007

2004 8 1 1

7-23

2001 5 1/2 1/2

11-32

2004 27 1 1

53-62

1994

1998

Sept em be 1

2006

76-82

July 2 00 38 2 2

1996

1965 36

226-235

2003 57

1-19

1990

2007

1860-1865

2006

263-270

Janu ar y 3rd