SciELO - Scientific Electronic Library Online

 
vol.16 issue2Combining Classifiers for BioinformaticsInferring Market Strategies: Applying Data-Mining to Analysis of Financial Markets author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.16 n.2 Ciudad de México Apr./Jun. 2012

 

Artículos

 

System-Level Fault Diagnosis with Dynamic Mesh Optimization

 

Detección de fallas en sistemas con optimización basada en mallas

 

Rafael Falcon1, Marcio Almeida1, Amiya Nayak1 and Rafael Bello2

 

1 School of Information Technology and Engineering (SITE), University of Ottawa, Canada rfalc032@site.uottawa.ca, malmeida@uottawa.ca, anayak@site.uottawa.ca

2 Artificial Intelligence Lab, Centre on Computing Studies, Universidad Central Marta Abreu de Las Villas, Santa Clara, Cuba rbellop@uclv.edu.cu

 

Article received on 12/02/2011.
Accepted on 27/10/2011.

 

Abstract

The efficient identification of hardware and software faults in parallel and distributed systems still remains a challenge in today's most prolific decentralized environments. System-level fault diagnosis is concerned with the detection of all faulty nodes in a set of hundreds (or even thousands) of interconnected units. This is accomplished by thoroughly examining the collection of outcomes of all tests carried out by the nodes under a particular test model. Such task has non-polynomial complexity and can be posed as a combinatorial optimization problem. In this paper we employ Dynamic Mesh Optimization (DMO) to detect faulty units in diagnosable systems. The proposed method encodes the potential solutions as binary vectors and exploits problem-specific knowledge to cope with infeasible individuals. The empirical analysis confirms that the DMO-based scheme outperforms existing techniques in terms of convergence speed and memory requirements, thus becoming a viable approach for real-time fault diagnosis in large-size systems.

Keywords. Fault diagnosis, input syndrome, dynamic mesh optimization, invalidation model, comparison model.

 

Resumen

La identificación eficiente de fallas de hardware y software ensistemas paralelos y distribuidos todavía sigue siendo un desafío en loscada vez más prolíficos sistemas decentralizados de estos tiempos. Eldiagnóstico de fallas en sistemas tiene que ver con la detección de todoslos nodos defectuosos en un conjunto de cientos (o quizá miles) de unidadesinterconectadas. Esto se logra mediante un minucioso examen de la colecciónde los resultados de las verificaciones realizadas por los nodos de acuerdoa un modelo de verificación en particular. Un examen así de detallado tieneuna complejidad no polinomial y puede ser presentado como un problema deoptimización combinatoria. En este artículo se emplea la Optimización Basadaen Mallas Dinámicas (Dynamic Mesh Optimization, DMO), para detectar unidadesdefectuosas en sistemas diagnosticables. El método propuesto representa lassoluciones potenciales como vectores binarios y explota el conocimiento específico del problema para lidiar con soluciones no factibles. El análisisempírico confirma que el enfoque basado en DMO supera en rendimiento atécnicas existentes en cuanto a la velocidad de convergencia y losrequerimientos de memoria, convirtiéndose así en un enfoque viable para eldiagnóstico en tiempo real de fallas en sistemas de largo alcance.

Palabras clave. Diagnóstico de fallas; síndrome de entrada; optimización basada en mallas dinámicas; modelo de invalidación; modelo de comparación.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Ahlswede, R. & Aydinian, H. (2008). On Diagnosability of Large Multiprocessor Networks. Discrete Applied Mathematics, 156(18), 3464–3474.         [ Links ]

2. Ayeb, B. (1999). Fault Identification Algorithmic: A New Formal Approach. 29th Annual International Symposium on Fault-Tolerant Computing, Madison, Wisconsin, USA, 138–145.         [ Links ]

3. Bello, R., Puris, A., Falcon, R., & Gómez, Y. (2008). Feature Selection through Dynamic Mesh Optimization. Progress in Pattern Recognition, Image Analysis and Applications, Lecture Notes in Computer Science, 5197, 348–355.         [ Links ]

4. Blough, D.M. & Pelc, A. (1992). Complexity of Fault Diagnosis in Comparison Models. IEEE Transactions on Computers, 41(3), 318–324.         [ Links ]

5. Bratton, D. & Kennedy, J. (2007). Defining a Standard for Particle Swarm Optimization. 2007 IEEE Swarm Intelligence Symposium (SIS 2007), Honolulu, HI, USA, 120–127.         [ Links ]

6. Chwa, K.Y. & Hakimi, S.L. (1981). Schemes for Fault Tolerant Computing: a Comparison of Modularly Redundant and t-Diagnosable Systems. Information & Control, 49(3), 212–238.         [ Links ]

7. Elhadef, M. & Ayeb, B. (2000). An Evolutionary Algorithm for Identifying Faults in t-Diagnosable Systems. 19th IEEE Symposium on Reliable Distributed Systems (SRDS-2000), Nurnberg, Germany, 74–83.         [ Links ]

8. Elhadef, M., Das, S., & Nayak, A. (2005). A Parallel Genetic Algorithm for Identifying Faults in Large Diagnosable Systems. The International Journal of Parallel, Emergent and Distributed Systems, 20(2), 113–125.         [ Links ]

9. Elhadef, M., Nayak, A., & Zeng, N. (2007). An Ant-based Fault Identification Algorithm for Distributed and Parallel Systems. 10th World Conference on Integrated Design & Process Technology (IDPT-2007), Antalya, Turkey, 1–6.         [ Links ]

10. Falcon, R., Almeida, M., & Nayak, A. (2010). A Binary Particle Swarm Optimization Approach to Fault Diagnosis in Parallel and Distributed Systems. 2010 IEEE Congress on Evolutionary Computation (CEC), Barcelona, Spain, 1–8.         [ Links ]

11. Falcon, R., Li, X., Nayak, A., & Stojmenovic, I. (2010). The One-Commodity Traveling Salesman Problem with Selective Pickup and Delivery: an Ant Colony Approach. 2010 IEEE Congress on Evolutionary Computation (CEC), Barcelona, Spain, 1–8.         [ Links ]

12. Hakimi, S.L. & Amin, A.T. (1974). Characterization of the Connection Assignment of Diagnosable Systems. IEEE Transactions on Computers, C-23(1), 86–88.         [ Links ]

13. Kameda, T., Toida, S., & Allan, F.J. (1975). A Diagnosis Algorithm for Networks. Information & Control, 29(2), 141–148.         [ Links ]

14. Kennedy, J. & Eberhart, R. (1995). Particle Swarm Optimization. IEEE International Conference on Neural Networks, Perth, Australia, 1942–1948.         [ Links ]

15. Kennedy, J. & Eberhart, R.C. (1997). A Discrete Binary Version of the Particle Swarm Algorithm. 1997 IEEE International Conference on Systems, Man, and Cybernetics, Orlando, Florida, USA, 5, 4104–4108.         [ Links ]

16. Madden, R.F. On Fault-Set Identification in Some System-Level Diagnostic Models", Proc. Int'l Symposium on Fault-Tolerant Computing, Jun. 1977.         [ Links ]

17. Maeng, J. & Malek, M. (1981). A Comparison Connection Assignment for Self-Diagnosis of Multiprocessor Systems, Proc. 11th International Symposium on Fault-Tolerant Computing, New York, USA, 1981, pp. 173–175        [ Links ]

18. Montes de Oca, M.A., Stutzle, T., Birattari, M., & Dorigo,M. (2009). Frankestein's PSO: A Composite Particle Swarm Optimization Algorithm. IEEE Transactions on Evolutionary Computation, 13(5), 1120–1132.         [ Links ]

19. Pelc, A. (1991). Undirected Graph Models for System-Level Fault Diagnosis. IEEE Transactions on Electronic Computers, 40(11), 1271–1276.         [ Links ]

20. Preparata, F.P., Metze, G., & Chien, R.T. (1967). On the Connection Assignment Problem of Diagnosable Systems. IEEE Transactions on Electronic Computers, EC-16(6), 848–854.         [ Links ]

21. Puris, A. & Bello, R. (2009). Optimización basada en Mallas Dinámicas. Su Aplicación en la Solución de Problemas de Optimización Continuos. VI Congreso Español Sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB'09), Málaga, Spain, 441–448.         [ Links ]

22. Sullivan, G.F. (1988). An O(t3+|E|) Fault Identification Algorithm for Diagnosable Systems. IEEE Transactions on Computers, 37(4), 388–397.         [ Links ]

23. Tzu-Liang, K., Hsing-Chung, C., & Tan, J.J.M. (2010). On the Faulty Sensor Identification Algorithm of Wireless Sensor Networks under the PMC Diagnosis Model. 6th International Conference on Networked Computing and Advanced Information Management (NCM), Seoul, Korea, 657–661.         [ Links ]

24. Vaquero L.M., Rodero-Merino, L., Caceres, J., & Lindner,M. (2009). A Break in the Clouds: Towards a Cloud Definition. ACM SIGCOMM ComputerCommunication Review, 39(1), 50–55.         [ Links ]

25. Verdone, R., Dardari, D., Mazzini, G., & Conti,A. (2008). Wireless Sensor and Actuator Networks:Technologies, Analysis and Design, London: Academic Press.         [ Links ]

26. Yang, H., Elhadef, M., Nayak, A., & Yang, X. (2008). Network Fault Diagnosis: An Artificial Immune System Approach. 14th IEEE International Conference on Parallel and Distributed Systems, Melbourne, Australia, 463–469.         [ Links ]

27. Yang, H., Yang, X., & Nayak, A. (2010). A Diagnosis Algorithm for Generalised Cube Networks. International Journal of Parallel Emergent and Distributed Systems, 25(3), 171–182.         [ Links ]

28. Yang, X., Megson, G.M., & Evans, D.J.(2005). A Comparison-based Diagnosis Algorithm Tailored for Crossed Cube Multiprocessor Systems. Microprocessors and Microsystems, 29(4), 169–175.         [ Links ]

29. Yang, X.S. (2008). Nature-Inspired Metaheuristic Algorithms. Cambridge: Luniver Press.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License