Detección de ruido y aprendizaje basado en información actual

Pascual González, Damaris; Vázquez Mesa, Fernando Daniel; Toro Pozo, Jorge Luis

doi:10.13053/CyS-18-1-2014-025

Serviços Personalizados

Journal

Artigo

Indicadores

Citado por SciELO
Acessos

Links relacionados

Similares em SciELO

Mais
Mais

Permalink

Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Comp. y Sist. vol.18 no.1 Ciudad de México Jan./Mar. 2014

https://doi.org/10.13053/CyS-18-1-2014-025

Artículos

Detección de ruido y aprendizaje basado en información actual

Noise Detection and Learning Based on Current Information

Damaris Pascual González¹, Fernando Daniel Vázquez Mesa¹ y Jorge Luis Toro Pozo²

¹ Facultad de Ciencias Económicas y Empresariales, Universidad de Oriente, Santiago de Cuba, Cuba. dpascual@eco.uo.edu.cu, fvazquez@eco.uo.edu.cu

² Facultad de Matemática y Computación, Universidad de Oriente, Santiago de Cuba, Cuba. jorgetp@ult.edu.cu

Resumen

Los métodos de limpieza de ruido tienen una gran significación en tareas de clasificación y en situaciones en las que es necesario realizar un aprendizaje semi-supervisado, debido a la importancia que tiene contar con muestras bien etiquetadas (prototipos) para clasificar nuevos patrones. En este trabajo, presentamos un nuevo algoritmo de detección de ruido en flujos de datos, que tiene en cuenta los cambios de los conceptos en el tiempo (concept drift), el cual está basado en criterios de vecindad, y su aplicación en la construcción automática de conjuntos de entrenamiento. En los experimentos realizados se utilizaron bases de datos sintéticas y reales, las últimas fueron tomadas del repositorio UCI, los resultados obtenidos avalan nuestra estrategia de detección de ruido en flujos de datos y en procesos de clasificación.

Palabras clave: Limpieza de ruido, flujo de datos, aprendizaje semisupervisado; concept drift.

Abstract

Methods for noise cleaning have great significance in classification tasks and in situations when it is necessary to carry out a semi-supervised learning due to importance of having well-labeled samples (prototypes) for classification of the new patterns. In this work, we present a new algorithm for detecting noise in data streams that takes into account changes in concepts over time (concept drift). The algorithm is based on the neighborhood criteria and its application uses the construction of a training set. In our experiments we used both synthetic and real databases, the latter were taken from UCI repository. The results support our proposal of noise detection in data streams and classification processes.

Keywords: Cleansing noise, data streams, semi-supervised learning, concept drift.

DESCARGAR ARTÍCULO EN FORMATO PDF

Referencias

1. Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., & Pechenizkiy, M. (2011). Handling concept drift in process mining. Advanced Information Systems Engineering. Lecture Notes in Computer Science, 6741, 391-405. [ Links ]

2. Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning. Cambridge, Mass.: MIT Press [ Links ]

3. Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern Classification (2^nd ed.). New York: Wiley. [ Links ]

4. Elwell, R. & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10),1517-1531. [ Links ]

5. García, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435. [ Links ]

6. Kalish, C.W., Rogers, T.T., Lang, J., & Zhu, X. (2011). Can semi-supervised learning explain incorrect beliefs about categories? Cognition, 120(1),106-118. [ Links ]

7. Klinkenberg, R. (2004). Learning drifting concepts: examples selection vs. examples weighting. Intelligence Data Analysis, 8(3), 281-300. [ Links ]

8. UCI Repository of Machine Learning Databases, University of California Irvine. [ Links ]

9. Pascual D., Pla, F., & Sánchez, J.S. (2010). A Density-based Hierarchical Clustering Algorithm for Highly Overlapped Distributions with Noisy Points. 2010 Conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence, Tarragona, Spain, 183-192. [ Links ]

10. Qiuhua, L., Xuejun, L., Hui, L., Stack, J.R., & Carin, L. (2009). Semisupervised Multitask Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1074-1086. [ Links ]

11. Rico-Juan, J.R. & Iñesta, J.M. (2012). New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 33(5), 654-660. [ Links ]

12. Rohban, M.H. & Rabiee, H.R. (2012). Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognition, 45(4), 1363-1372. [ Links ]

13. Ross, G.J., Adams, N.M., Tasoulis, D.K., & Hand, D.J. (2012). Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33(2), 191-198. [ Links ]

14. Segata, N., Blanzieri, E., Delany, S.J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 301-331. [ Links ]

15. Settles, B. (2009). Active learning literature survey (Technical Report 1648). Madison, USA: University of Wisconsin. [ Links ]

16. Vázquez, F., Sánchez, J.S., & Pla, F. (2005). A stochastic approach to Wilson's editing algorithm. Pattern Recognition and Image Analysis. Lecture Notes on Computer Science, 3523, 35-42. [ Links ]

17. Vázquez, F.D., Sánchez, J.S., & Pla, F. (2008). Learning and forgetting with local Information of new objects. Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes on Computer Sciences. 5197, 261-268. [ Links ]

18. Wilson, D.L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems Man and Cibernetics, SMC-2(3), 408-421. [ Links ]

19. Wilson, D.R. & Martínez, T.R. (2000). Reduction techniques for instance based learning algorithms. Machine Learning, 38(3), 257-286. [ Links ]

20. Zhou, Y. & Goldman, S. (2004). Democratic Co-learning. 16^th IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, Florida, 594-602. [ Links ]

21. Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., & Shi, Y. (2008). Cleansing Noisy Data Streams. Eighth IEEE International Conference on Data Mining, Pisa, Italy, 1139-1144. [ Links ]