SciELO - Scientific Electronic Library Online

 
vol.13 issue3Active Vibration Control Using On-line Algebraic Identification and Sliding ModesAutomatic Compensation of Dynamical Forces in Rotating Systems author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.13 n.3 Ciudad de México Jan./Mar. 2010

 

Artículos

 

Assessing Data Quality of Integrated Data by Quality Aggregation of its Ancestors

 

Evaluación de Calidad de Datos Integrados por Agregación de Calidad de sus Ancestros

 

Maria del Pilar Angeles1 and Lachlan Mhor MacKinnon2

 

1 Facultad de Ingeniería, División de Ingeniería Eléctrica, Departamento Computación,UNAM. Edificio "Bernardo Quintana" 2do. Piso, CU., C.P., 04510 México D.F. Tel. 56223012. pilar@macs.hw.ac.uk

2 Computing & Creative Technologies, University of Abertay Dundee Dundee DD1 HG. Tel. 01382308601. mackinnon@abertay.ac.uk

 

Article received on April 29, 2008
Accepted on January 05, 2009

 

Resumen

La calidad de los datos se degrada durante el proceso de extracción y fusión de datos a partir de múltiples fuentes de datos heterogéneas. Además, los usuarios no tienen información acerca de la calidad de los datos que accesan.

Este documento presenta los métodos utilizados para la evaluación de la calidad de datos a múltiples niveles de granularidad, incluyendo datos derivados no atómicos teniendo en cuenta la proveniencia de los datos. El prototipo del Manejador de Calidad de Datos ha sido implementado para poder probar dicha evaluación.

Palabras clave: Calidad de Datos, Datos derivados, Integración de Datos, Evaluación de Calidad de Datos.

 

Abstract

Data Quality is degraded during the process of extracting and merging data from multiple heterogeneous sources. Besides, users have no information regarding the quality of the accessed data.

This document presents the methods utilized to assess data quality at multiple levels of granularity, including derived non–atomic data, considering data provenance. The Data Quality Manager prototype has been implemented and tested to prove such assessment.

Keywords: Data Quality, Derived Data, Data Integration, Assessment of Data Quality.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

1. Angeles P. & MacKinnon L.M. (2004). Detection and Resolution of Data Inconsistencies, and Data Integration using Data Quality Criteria. QUATIC 2004: Conference for Quality in Information and Communications Technology, Porto, Portugal, 87–94.        [ Links ]

2. Angeles, P. & MacKinnon L.M. (2005). Tracking Data Provenance with a Snared Metadata. Postgraduate Research Conference in Electronics, Photonics, Communications and Networks, and Computing Science, Lancaster England, UK, 120–121.        [ Links ]

3. Angeles, P. & MacKinnon L.M. (2005). Quality Measurement and Assessment Models Including Data Provenance to Grade Data Sources. International Conference on Computer Science and Information Systems, Athens, Greece, 101–118.        [ Links ]

4. Ballou, D. & Tayi G. (1998). Examining Data Quality. Communications of the ACM, 41 (2), 54–57.        [ Links ]

5. Bovee, M., Mark, B. & Srivastava E. P. (2001). A Conceptual Framework and Belief Function Approach to Assessing Overall Information Quality. International Journal of Intelligent Systems, 18(1), 51–74.        [ Links ]

6. Burgess, M. S. E. (2003). Using Multiple Quality Criteria to Focus Information Search Results, Ph.D Thesis, Cardiff University, Cardiff, Wales, United Kingdom.        [ Links ]

7. Cui, Y. & Widom, J. (2000). Practical Lineage Tracing in Data Warehouses. 16th International Conference on Data Engineering (ICDE'00), San Diego, California, USA, 367–378.        [ Links ]

8. Hernández, M. A., & Stolfo, S. J. (1998). Real–world data is dirty: Data cleansing and the merge/purge problem. Data Mining and Knowledge Discovery, 2(1), 9–37.        [ Links ]

9. Hwang, C. L. & Yoon, K. (1995). Multiple Attribute Decision Making: An Introduction. London: Sage Publications Inc.        [ Links ]

10. Motro, A. & Rakov, I. (1998). Estimating the Quality of Databases. Third International Conference on Flexible Query Answering Systems. Lecture Notes in Computer Science, 1495, 298–307.        [ Links ]

11. Naumann, F. & Roker, C. (2000). Assessment Methods for Information Quality Criteria. International Conference on Information Quality IQ2000. Cambridge, MA, USA, 148–162.        [ Links ]

12. Naumann, F. (2002). Quality–Driven Query Answering for Integrated Information Systems, Lecture Notes in Computer Science, 2261. Berlin: Springer.        [ Links ]

13. Naumann, F, Freytag, J. & Lesser, U. (2004). Completeness of Information Sources. Information Systems, 29(7), 583–615.        [ Links ]

14. Pipino, L. L., Yang, W. L. & Wang, R. Y. (2002). Data Quality Assessment. Communications of the ACM. 44(4ve), 211–218.        [ Links ]

15. Scannapieco, M. & Batini, C. (2004). Completeness in the Relational Model: A Comprehensive Framework. 9th International Conference on Information Quality ICIQ–04, Cambridge, MA, USA, 333–345.        [ Links ]

16. Sheskin, D. (2004). Handbook of Parametric and Nonparametric Statistical Procedures. London: Chapman & Hall.        [ Links ]

17. Transaction Processing Performance Council (TPC). TPC Benchmark C Standard Specification Revision 5.9, 2007.        [ Links ]

18. Transaction Processing Performance Council (TPCH). TPC Benchmark H (Decision Support). Standard Specification Revision 2.6.2,2008.        [ Links ]

19. Wang, R. Y., Reedy, M. P., & Gupta, A. (1993). An Object–Oriented Implementation of Quality Data Products. Workshop on Information Technology Systems, Orlando, FL, USA. Retrieved from http://web.mit.edu/tdqm/www/tdqmpub/WITS93ObjectDec93.pdf.        [ Links ]

20. Wilcoxon, F. & Wilcoxon, R. A. (1964). Some Rapid Approximate Statistical Procedure. New York: American Cyanamid Co.        [ Links ]

21. Woodruff, A. & Stonebraker, M. (1997). Supporting fine–grained data lineage in a database visualization environment. International Conference on Data Engineering ICDE, Berkeley, California, USA, 91–102.        [ Links ]

22. Zhang, W. (2004). Handover Decision Using Fuzzy MADM in Heterogeneous Networks. IEEE Wireless Communications and Networking Conference WCNC 2004, Atlanta, Georgia, USA, 3–9.        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License