Solving Multiple Queries through a Permutation Index in GPU

Lopresti, Mariela; Miranda, Natalia; Piccoli, Fabiana; Reyes, Nora

Services on Demand

Journal

Article

Indicators

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.17 n.3 Ciudad de México Jul./Sep. 2013

Artículos

Solving Multiple Queries through a Permutation Index in GPU

Resolución de múltiples consultas usando índice de permutación en GPU

Mariela Lopresti, Natalia Miranda, Fabiana Piccoli, and Nora Reyes

LIDIC. Universidad Nacional de San Luis, Ejército de los Andes 950 - 5700, San Luis, Argentina. omlopres@unsl.edu.ar, ncmiran@unsl.edu.ar, mpiccoli@unsl.edu.ar, nreyes@unsl.edu.ar

Article received on 19/02/2013;
accepted on 25/07/2013.

Abstract

Query-by-content by means of similarity search is a fundamental operation for applications that deal with multimedia data. For this kind of query it is meaningless to look for elements exactly equal to the one given as query. Instead, we need to measure dissimilarity between the query object and each database object. The metric space model is a paradigm that allows modeling all similarity search problems. Metric databases permit to store objects from a metric space and efficiently perform similarity queries over them, in general, by reducing the number of distance evaluations needed. Therefore, the goal is to preprocess a particular dataset in such a way that queries can be answered with as few distance computations as possible. Moreover, for a very large metric database it is not enough to preprocess the dataset by building an index, it is also necessary to speed up the queries via high performance computing using GPU. In this work we show an implementation of a pure GPU architecture to build a Permutation Index used for approximate similarity search on databases of different data nature and to solve many queries at the same time. Besides, we evaluate the tradeoff between the answer quality and time performance of our implementation.

Keywords: Metric space, approximate similarity search, permutation index, high performance computing, GPU.

Resumen

Realizar consultas por contenido, a través de búsquedas de similitud, es una operación fundamental para aplicaciones relacionadas con datos multimedia. En este tipo de consultas no tiene sentido buscar elementos exactamente iguales a uno dado como consulta. En su lugar, es necesario medirla disimilitud entre el objeto de consulta y cada objeto de la base de datos. El modelo de espacio métrico es un paradigma que permite modelar todos los problemas de búsqueda por similitud. Las bases de datos métricas permiten el almacenamiento de objetos de un espacio métrico y responder consultas por similitud de manera eficiente, generalmente, mediante la reducción del número de evaluaciones de distancia. En consecuencia, el objetivo es pre-procesar el conjunto de datos de manera que las consultas pueden ser respondidas con el menor número posible de cálculos de distancia. Más aún, para grandes bases de datos métricas no basta con procesar previamente el conjunto de datos mediante la creación de un índice, también es necesario acelerar las consultas mediante el uso de computación de alto desempeño, una alternativa es utilizar GPU. En este trabajo se muestra una implementación de una arquitectura de GPU pura para construir el Pemutation Index, el cual nos permite resolver en paralelo múltiples consultas por similitud aproximadas en bases de datos de diferente naturaleza. Además se evalúa el compromiso entre la calidad de respuesta y el desempeño de nuestra aplicación. Finalmente se presentan resultados experimentales.

Palabras clave: Espacios métricos, búsquedas aproximadas por similitud, índice de permutación, computación de alto desempeño, GPU.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgements

We wish to thank the UNSL for allowing us to access their computational resources. This research has been partially supported by Project UNSL-PROICO-30310 and Project UNSL-PROICO-330303.

References

1. Barrientos, R., Gomez, J., Tenllado, C., & Prieto, M. (2010). Heap based k-nearest neighbor search on gpus. In XXI Jornadas de Paralelismo. 559-566. [ Links ]

2. Barrientos, R. J., Gomez, J., Tenllado, C., Prieto, M., & Marin, M. (2011). kNN Query Processing in Metric Spaces using GPUs. volume 6852. ISBN 978-3-642-23399-9, 380-392. [ Links ]

3. Benjamin, B. & Navarro, G. (2004). Probabilistic proximity searching algorithms based on compact partitions. Discrete Algorithms, 2(1), 115-134. ISSN 1570-8667. doi:10.1016/S1570-8667(03)00067-4. [ Links ]

4. Bustos, B., Deussen, O., Hiller, S., & Keim, D. (2006). A graphics hardware accelerated algorithm for nearest neighbor search. In Proc. International Conference on Computational Science (ICCS'06) Part IV, volume 3994 of LNCS. Springer, 196-199. [ Links ]

5. Chavez, E., Figueroa, K., & Navarro, G. (2005). Proximity searching in high dimensional spaces with a proximity preserving order. In Proc. 4th Mexican International Conference on Artificial Intelligence (MICAI), LNAI 3789. 405-414. [ Links ]

6. Chavez, E., Navarro, G., Baeza-Yates, R., & Marroquín, J. (2001). Searching in metric spaces. ACM Comput. Surv., 33(3), 273-321. [ Links ]

7. Ciaccia, P. & Patella, M. (2010). Approximate and probabilistic methods. SIGSPATIAL Special, 2(2), 16-19. ISSN 1946-7729. doi:10.1145/1862413.1862418. [ Links ]

8. Fagin, R., Kumar, R., & Sivakumar, D. (2003). Comparing top k lists. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, SODA '03. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. ISBN 0-89871-538-5, 28-36. [ Links ]

9. Farber, R. (2011). CUDA Application Design and Development. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition. ISBN 0123884268, 9780123884268. [ Links ]

10. Figueroa, K., Chavez, E., Navarro, G., & Paredes, R. (2009). Speeding up spatial approximation search in metric spaces. ACM Journal of Experimental Algorithmics, 14, article 3.6. [ Links ]

11. Garcia, V., Debreuve, E., Nielsen, F., & Barlaud, M. (2010). k-nearest neighbor search: fast GPU-based implementations and application to high-dimensional feature matching. In IEEE International Conference on Image Processing. Hong Kong, [ Links ] -.

12. Hoberock, J. & Bell, N. (2010). Thrust: A parallel template library. Version 1.3.0. [ Links ]

13. Kato, K. & Hosino, T. (2010). Solving k-nearest neighbor problem on multiple graphics processors. In ACM, editor, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID. 769-773. [ Links ]

14. Kirk, D. B. & Hwu, W. W. (2010). Programming Massively Parallel Processors, A Hands on Approach. Elsevier, Morgan Kaufmann. ISBN 978-0-12-381472-2. [ Links ]

15. Liang, S., Liu, Y., Wang, C., & Jian, L. (2010). Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU. In IEEE 2nd Symposium on Web Society (SWS). ISBN 978-1-4244-6356-5, 53 - 60. [ Links ]

16. Lopresti, M., Miranda, N., Piccoli, F., & Reyes, N. (2012). Efficient similarity search on multimedia databases. I n XVIII Congreso Argentino de Ciencias de la Computación, CACIC 2012. 1079-1088. [ Links ]

17. Moreno-Seco, F., Mico, L., & Oncina, J. (2003). A modification of the laesa algorithm for approximated k-nn classification. Pattern Recognition Letters, 24(1), 47 - 53. ISSN 0167-8655. doi:10.1016/S0167-8655(02)00187-3. [ Links ]

18. NVIDIA (2012). Nvidia cuda compute unified device architecture, programming guide version 4.2. In NVIDIA. [ Links ]

19. Owens, J., Houston, M., Luebke, D., Green, S., Stone, J., & Phillips, J. (2008). GPU Computing. IEEE, 96(5), 879 - 899. [ Links ]

20. Patella, M. & Ciaccia, P. (2009). Approximate similarity search: A multi-faceted problem. J. Discrete Algorithms, 7(1), 36-48. [ Links ]

21. Samet, H. (2005). Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. ISBN 0123694469. [ Links ]

22. Singh, A., Ferhatosmanoglu, H., & Tosun, A. (2003). High dimensional reverse nearest neighbor queries. In The twelfth international conference on Information and knowledge management, CIKM '03. ACM, New York, NY, USA. ISBN 1-58113-723-0, 91-98. doi:10.1145/956863.956882. [ Links ]

23. Singleton, R. (1969). Algorithm 347: an efficient algorithm for sorting with minimal storage [m1]. Commun. ACM, 12(3), 185-186. ISSN 0001-0782. [ Links ]

24. Uribe-Paredes, R., Valero-Lara, P., Arias, E., Sanchez, J. L., & Cazorla, D. (2011). A GPU-Based Implementation for Range Queries on Spaghettis Data Structure. In ICCSA (1), volume 6782 of Lecture Notes in Computer Science. Springer. ISBN 978-3-642-21927-6, 615-629. [ Links ]

25. Zezula, P., Amato, G., Dohnal, V., & Batko, M. (2006). Similarity Search: The Metric Space Approach. Advances in Database Systems, vol.32. Springer. [ Links ]