SciELO - Scientific Electronic Library Online

 
vol.18 issue1Introducing Biases in Document ClusteringLearning with Online Drift Detection author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.18 n.1 México Jan./Mar. 2014

http://dx.doi.org/10.13053/CyS-18-1-2014-025 

Artículos

 

Detección de ruido y aprendizaje basado en información actual

 

Noise Detection and Learning Based on Current Information

 

Damaris Pascual González1, Fernando Daniel Vázquez Mesa1 y Jorge Luis Toro Pozo2

 

1 Facultad de Ciencias Económicas y Empresariales, Universidad de Oriente, Santiago de Cuba, Cuba. dpascual@eco.uo.edu.cu, fvazquez@eco.uo.edu.cu

2 Facultad de Matemática y Computación, Universidad de Oriente, Santiago de Cuba, Cuba. jorgetp@ult.edu.cu

 

Resumen

Los métodos de limpieza de ruido tienen una gran significación en tareas de clasificación y en situaciones en las que es necesario realizar un aprendizaje semi-supervisado, debido a la importancia que tiene contar con muestras bien etiquetadas (prototipos) para clasificar nuevos patrones. En este trabajo, presentamos un nuevo algoritmo de detección de ruido en flujos de datos, que tiene en cuenta los cambios de los conceptos en el tiempo (concept drift), el cual está basado en criterios de vecindad, y su aplicación en la construcción automática de conjuntos de entrenamiento. En los experimentos realizados se utilizaron bases de datos sintéticas y reales, las últimas fueron tomadas del repositorio UCI, los resultados obtenidos avalan nuestra estrategia de detección de ruido en flujos de datos y en procesos de clasificación.

Palabras clave: Limpieza de ruido, flujo de datos, aprendizaje semisupervisado; concept drift.

 

Abstract

Methods for noise cleaning have great significance in classification tasks and in situations when it is necessary to carry out a semi-supervised learning due to importance of having well-labeled samples (prototypes) for classification of the new patterns. In this work, we present a new algorithm for detecting noise in data streams that takes into account changes in concepts over time (concept drift). The algorithm is based on the neighborhood criteria and its application uses the construction of a training set. In our experiments we used both synthetic and real databases, the latter were taken from UCI repository. The results support our proposal of noise detection in data streams and classification processes.

Keywords: Cleansing noise, data streams, semi-supervised learning, concept drift.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

Referencias

1. Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., & Pechenizkiy, M. (2011). Handling concept drift in process mining. Advanced Information Systems Engineering. Lecture Notes in Computer Science, 6741, 391-405.         [ Links ]

2. Chapelle, O., Schölkopf, B., & Zien, A. (2006). Semi-supervised learning. Cambridge, Mass.: MIT Press        [ Links ]

3. Duda, R.O., Hart, P.E., & Stork, D.G. (2001). Pattern Classification (2nd ed.). New York: Wiley.         [ Links ]

4. Elwell, R. & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10),1517-1531.         [ Links ]

5. García, S., Derrac, J., Cano, J., & Herrera, F. (2012). Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 417-435.         [ Links ]

6. Kalish, C.W., Rogers, T.T., Lang, J., & Zhu, X. (2011). Can semi-supervised learning explain incorrect beliefs about categories? Cognition, 120(1),106-118.         [ Links ]

7. Klinkenberg, R. (2004). Learning drifting concepts: examples selection vs. examples weighting. Intelligence Data Analysis, 8(3), 281-300.         [ Links ]

8. UCI Repository of Machine Learning Databases, University of California Irvine.         [ Links ]

9. Pascual D., Pla, F., & Sánchez, J.S. (2010). A Density-based Hierarchical Clustering Algorithm for Highly Overlapped Distributions with Noisy Points. 2010 Conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence, Tarragona, Spain, 183-192.         [ Links ]

10. Qiuhua, L., Xuejun, L., Hui, L., Stack, J.R., & Carin, L. (2009). Semisupervised Multitask Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1074-1086.         [ Links ]

11. Rico-Juan, J.R. & Iñesta, J.M. (2012). New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 33(5), 654-660.         [ Links ]

12. Rohban, M.H. & Rabiee, H.R. (2012). Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognition, 45(4), 1363-1372.         [ Links ]

13. Ross, G.J., Adams, N.M., Tasoulis, D.K., & Hand, D.J. (2012). Exponentially weighted moving average charts for detecting concept drift. Pattern Recognition Letters, 33(2), 191-198.         [ Links ]

14. Segata, N., Blanzieri, E., Delany, S.J., & Cunningham, P. (2010). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 301-331.         [ Links ]

15. Settles, B. (2009). Active learning literature survey (Technical Report 1648). Madison, USA: University of Wisconsin.         [ Links ]

16. Vázquez, F., Sánchez, J.S., & Pla, F. (2005). A stochastic approach to Wilson's editing algorithm. Pattern Recognition and Image Analysis. Lecture Notes on Computer Science, 3523, 35-42.         [ Links ]

17. Vázquez, F.D., Sánchez, J.S., & Pla, F. (2008). Learning and forgetting with local Information of new objects. Progress in Pattern Recognition, Image Analysis and Applications. Lecture Notes on Computer Sciences. 5197, 261-268.         [ Links ]

18. Wilson, D.L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems Man and Cibernetics, SMC-2(3), 408-421.         [ Links ]

19. Wilson, D.R. & Martínez, T.R. (2000). Reduction techniques for instance based learning algorithms. Machine Learning, 38(3), 257-286.         [ Links ]

20. Zhou, Y. & Goldman, S. (2004). Democratic Co-learning. 16th IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, Florida, 594-602.         [ Links ]

21. Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., & Shi, Y. (2008). Cleansing Noisy Data Streams. Eighth IEEE International Conference on Data Mining, Pisa, Italy, 1139-1144.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License