Services on Demand
Journal
Article
Indicators
- Cited by SciELO
- Access statistics
Related links
- Similars in SciELO
Share
Computación y Sistemas
On-line version ISSN 2007-9737Print version ISSN 1405-5546
Comp. y Sist. vol.13 n.3 Ciudad de México Jan./Mar. 2010
Artículos
Assessing Data Quality of Integrated Data by Quality Aggregation of its Ancestors
Evaluación de Calidad de Datos Integrados por Agregación de Calidad de sus Ancestros
Maria del Pilar Angeles1 and Lachlan Mhor MacKinnon2
1 Facultad de Ingeniería, División de Ingeniería Eléctrica, Departamento Computación,UNAM. Edificio "Bernardo Quintana" 2do. Piso, CU., C.P., 04510 México D.F. Tel. 56223012. pilar@macs.hw.ac.uk
2 Computing & Creative Technologies, University of Abertay Dundee Dundee DD1 HG. Tel. 01382308601. mackinnon@abertay.ac.uk
Article received on April 29, 2008
Accepted on January 05, 2009
Resumen
La calidad de los datos se degrada durante el proceso de extracción y fusión de datos a partir de múltiples fuentes de datos heterogéneas. Además, los usuarios no tienen información acerca de la calidad de los datos que accesan.
Este documento presenta los métodos utilizados para la evaluación de la calidad de datos a múltiples niveles de granularidad, incluyendo datos derivados no atómicos teniendo en cuenta la proveniencia de los datos. El prototipo del Manejador de Calidad de Datos ha sido implementado para poder probar dicha evaluación.
Palabras clave: Calidad de Datos, Datos derivados, Integración de Datos, Evaluación de Calidad de Datos.
Abstract
Data Quality is degraded during the process of extracting and merging data from multiple heterogeneous sources. Besides, users have no information regarding the quality of the accessed data.
This document presents the methods utilized to assess data quality at multiple levels of granularity, including derived nonatomic data, considering data provenance. The Data Quality Manager prototype has been implemented and tested to prove such assessment.
Keywords: Data Quality, Derived Data, Data Integration, Assessment of Data Quality.
DESCARGAR ARTÍCULO EN FORMATO PDF
References
1. Angeles P. & MacKinnon L.M. (2004). Detection and Resolution of Data Inconsistencies, and Data Integration using Data Quality Criteria. QUATIC 2004: Conference for Quality in Information and Communications Technology, Porto, Portugal, 8794. [ Links ]
2. Angeles, P. & MacKinnon L.M. (2005). Tracking Data Provenance with a Snared Metadata. Postgraduate Research Conference in Electronics, Photonics, Communications and Networks, and Computing Science, Lancaster England, UK, 120121. [ Links ]
3. Angeles, P. & MacKinnon L.M. (2005). Quality Measurement and Assessment Models Including Data Provenance to Grade Data Sources. International Conference on Computer Science and Information Systems, Athens, Greece, 101118. [ Links ]
4. Ballou, D. & Tayi G. (1998). Examining Data Quality. Communications of the ACM, 41 (2), 5457. [ Links ]
5. Bovee, M., Mark, B. & Srivastava E. P. (2001). A Conceptual Framework and Belief Function Approach to Assessing Overall Information Quality. International Journal of Intelligent Systems, 18(1), 5174. [ Links ]
6. Burgess, M. S. E. (2003). Using Multiple Quality Criteria to Focus Information Search Results, Ph.D Thesis, Cardiff University, Cardiff, Wales, United Kingdom. [ Links ]
7. Cui, Y. & Widom, J. (2000). Practical Lineage Tracing in Data Warehouses. 16th International Conference on Data Engineering (ICDE'00), San Diego, California, USA, 367378. [ Links ]
8. Hernández, M. A., & Stolfo, S. J. (1998). Realworld data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 937. [ Links ]
9. Hwang, C. L. & Yoon, K. (1995). Multiple Attribute Decision Making: An Introduction. London: Sage Publications Inc. [ Links ]
10. Motro, A. & Rakov, I. (1998). Estimating the Quality of Databases. Third International Conference on Flexible Query Answering Systems. Lecture Notes in Computer Science, 1495, 298307. [ Links ]
11. Naumann, F. & Roker, C. (2000). Assessment Methods for Information Quality Criteria. International Conference on Information Quality IQ2000. Cambridge, MA, USA, 148162. [ Links ]
12. Naumann, F. (2002). QualityDriven Query Answering for Integrated Information Systems, Lecture Notes in Computer Science, 2261. Berlin: Springer. [ Links ]
13. Naumann, F, Freytag, J. & Lesser, U. (2004). Completeness of Information Sources. Information Systems, 29(7), 583615. [ Links ]
14. Pipino, L. L., Yang, W. L. & Wang, R. Y. (2002). Data Quality Assessment. Communications of the ACM. 44(4ve), 211218. [ Links ]
15. Scannapieco, M. & Batini, C. (2004). Completeness in the Relational Model: A Comprehensive Framework. 9th International Conference on Information Quality ICIQ04, Cambridge, MA, USA, 333345. [ Links ]
16. Sheskin, D. (2004). Handbook of Parametric and Nonparametric Statistical Procedures. London: Chapman & Hall. [ Links ]
17. Transaction Processing Performance Council (TPC). TPC Benchmark C Standard Specification Revision 5.9, 2007. [ Links ]
18. Transaction Processing Performance Council (TPCH). TPC Benchmark H (Decision Support). Standard Specification Revision 2.6.2,2008. [ Links ]
19. Wang, R. Y., Reedy, M. P., & Gupta, A. (1993). An ObjectOriented Implementation of Quality Data Products. Workshop on Information Technology Systems, Orlando, FL, USA. Retrieved from http://web.mit.edu/tdqm/www/tdqmpub/WITS93ObjectDec93.pdf. [ Links ]
20. Wilcoxon, F. & Wilcoxon, R. A. (1964). Some Rapid Approximate Statistical Procedure. New York: American Cyanamid Co. [ Links ]
21. Woodruff, A. & Stonebraker, M. (1997). Supporting finegrained data lineage in a database visualization environment. International Conference on Data Engineering ICDE, Berkeley, California, USA, 91102. [ Links ]
22. Zhang, W. (2004). Handover Decision Using Fuzzy MADM in Heterogeneous Networks. IEEE Wireless Communications and Networking Conference WCNC 2004, Atlanta, Georgia, USA, 39. [ Links ]