SciELO - Scientific Electronic Library Online

 
vol.17 issue3A Parallel PSO Algorithm for a Watermarking Application on a GPUPerformance Evaluation of Infrastructure as Service Clouds with SLA Constraints author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.3 Ciudad de México Jul./Sep. 2013

 

Artículos

 

Suffix Array Performance Analysis for Multi-Core Platforms

 

Análisis de performance para el arreglo de sufijos sobre plataformas multi-core

 

Verónica Gil-Costa1,2, Cesar Ochoa1, and A. Marcela Printista1,2

 

1 LIDIC, University of San Luis, Argentina. gvcosta@unsl.edu.ar

2 CONICET, Argentina.

 

Article received on 01/02/2013;
accepted on 30/07/2013.

 

Abstract

Performance analysis helps to understand how a particular invocation of an algorithm executes. Using the information provided by specific tools like the profiler tool Perf or the Performance Application Programming Interface (PAPI), the performance analysis process provides a bridging relationship between the algorithm execution and processor events according to the metrics defined by the developer. It is also useful to find performance limitations which depend exclusively on the code. Furthermore, to change an algorithm in order to optimize the code requires more than understanding of the obtained performance. It requires understanding the problem being solved. In this work we evaluate the performance achieved by a suffix array over a 32-core platform. Suffix arrays are efficient data structures for solving complex queries in a number of applications related to text databases, for instance, biological databases. We perform experiments to evaluate hardware features directly aimed to parallelize computation. Moreover, according to the results obtained by the performance evaluation tools, we propose an optimization technique to improve the use of the cache memory. In particular, we aim to reduce the number of cache memory replacement performed each time a new query is processed.

Keywords: Multi-core, suffix array.

 

Resumen

El análisis de performance es utilizado para entender cómo se ejecuta una invocación particular de un algoritmo. Al utilizar la información provista por las herramientas específicas como Perf o "Performance Application Programming Interface" (PAPI), el proceso de análisis de performance provee un puente entre la ejecución del algoritmo y los eventos de los procesadores de acuerdo a las métricas definidas por el desarrollador. También es útil para encontrar las limitaciones del rendimiento del algoritmo, las cuales dependen del código. Además, para modificar un algoritmo de forma tal de optimizar el código, es necesario no sólo entender el rendimiento obtenido, sino que requiere entender el problema que se quiere resolver. En este trabajo, evaluamos el rendimiento obtenido por el arreglo de sufijos en un procesador de 32 cores. Los arreglos de sufijos son estructuras de datos eficientes para resolver consultas complejas en aplicaciones relacionadas con bases de datos textuales, por ejemplo bases de datos biológicas. Ejecutamos experimentos para evaluar las características del hardware con el objetivo de mejorar el cómputo paralelo. Además, de acuerdo a los resultados obtenidos a través de las herramientas de evaluación de performance, proponemos una técnica de optimización para mejorar el uso de la memoria cache. En particular, nuestro objetivo es reducir el número de reemplazos realizados en las memorias caches.

Palabras clave: Multi-core, arreglo de sufijos.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

  

References

1. Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.W., Ryu, S., Steele Jr., G.L., & Tobin-Hochstadt, S. (2007). The Fortress Language Specification, version 1.0 beta, Sun Microsystems, Inc.         [ Links ]

2. Adjeroh, D., Bell, T., & Mukherjee, A. (2008). The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. New York: Springer.         [ Links ]

3. Burrows, M. & Wheeler, D.J. (1994). A block-sorting lossless data compression algorithm (Research Report 124), Palo Alto California: Digital Systems Research Center.         [ Links ]

4. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., & Sarkar, V. (2005). X10: an object oriented approach to non-uniform cluster computing. 20th annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages and Applications (OOPSLA '05). San Diego, CA, 519-538.         [ Links ]

5. Cormen, T.H., Leiserson, C.E., Rivest, R.L. & Stein, C. (2009). Introduction to Algorithms (3rd ed.). Cambridge, Mass.: MIT Press.         [ Links ]

6. Ferragina, P. & Navarro, G. (s.f.). The Pizza&Chili corpus — compressed indexes and their testbeds. Retrieved from http://pizzachili.dcc.uchile.cl/index.html.         [ Links ]

7. The Portable Hardware Locality (hwloc). Retrieved from http://www.open-mpi.org/projects/hwloc/.         [ Links ]

8. Hennesy, J.L. & Patterson, D.A. (2007). Computer Architecture - A Quantitative Approach (4th ed.). Amsterdam; Boston: Morgan Kaufmann.         [ Links ]

9. Patterson, D.A. & Hennesy, J.L. (2009). Computer Organization and Design, The Hardware/Software Interface (4th edition). Burlington, MA: Morgan Kaufmann.         [ Links ]

10. Hager, G. & Wellein, G. (2011). Introduction to High Performance Computing for Scientists and Engineers. Boca Raton, FL: CRC Press.         [ Links ]

11. Manber, U. & Myers, G. (1993). Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing, 22(5), 935-948.         [ Links ]

12. OpenMP Application Program Interface - Version 3.1,(2011). Retrieved from http://www.openmp.org/mp-documents/OpenMP3.1.pdf.         [ Links ]

13. Reinders, J. (2007). Intel Threading Building Blocks: Outfitting C++ for Multicore Processor Parallelism. Beijing; Sebastopol, CA: O'Reilly.         [ Links ]

14. Stoye, J. (2007). Suffix tree construction in ram. Encyclopedia of Algorithms (925-928). New York; London: Springer.         [ Links ]

15. Tinetti, F.G., Martin, S.M. (2012). Sequential Optimization and Shared and Distributed Memory Optimization in Clusters: N-BODY/Particle Simulation. Parallel and Distributed Computing and Systems (PDCS 2012), Las Vegas, USA.         [ Links ]

16. Weiner, P. (1973). Linear pattern matching algorithms. IEEE Conference Record of 14 Annual Symposium on Switching and Automata Theory (SWAT'08), 1-11.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License