SciELO - Scientific Electronic Library Online

 
 issue51Object Classification using Hybrid Holistic Descriptors: Application to Building Detection in Aerial OrthophotosMobile ACORoute-Route Recommendation Based on Communication by Pheromones author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Polibits  n.51 México Jan./Jun. 2015

https://doi.org/10.17562/PB-51-3 

Integración de fuentes heterogéneas de datos textuales

 

Integration of Heterogeneous Textual Data Sources

 

Benina Velázquez Ordoñez1, Jesús Manuel Olivares Ceja2, Miguel Patiño Ortiz3, Julián Patiño Ortíz3, Adolfo Guzmán Arenas2

 

1 Instituto Politécnico Nacional (IPN), en la Escuela Superior de Ingeniería Mecánica y Eléctrica (ESIME), DF, México. (correo: bvelazquez@ipn.mx).

2 IPN, en el Centro de Investigación en Computación (CIC), México, DF. (correo: jesus@cic.ipn.mx, a.guzman@ieee.org)

3 IPN-ESIME, DF, México. (correo: mpatino2002@ipn.mx, jpatinoo@ipn.mx).

 

Manuscrito recibido el 19 de junio de 2014,
Aceptado para su publicación el 10 de julio de 2014,
Publicado el 15 de junio 2015.

 

Resumen

Se ha detectado que en algunas aplicaciones de integración de información de fuentes de datos, en algunos casos pueden ocurrir inconsistencias y en otros, se carece de una entidad para almacenar los datos. Algunas inconsistencias se deben a que los datos se expresan en diferente idioma al utilizado en el repositorio o por el uso de diferentes unidades de medida. En este artículo, la propuesta utiliza reglas en la integración de datos tratando de preservar la consistencia y en otros casos implican modificaciones al esquema. Se seleccionó el modelo orientado a objetos por sus características que facilitan la reutilización de clases. La base de datos de ejemplo utiliza datos obtenidos de fuentes heterogéneas de la Web pertenecientes al dominio de equipos de computación. En la integración, intervienen entidades, atributos, valores y unidades de medida. Esta propuesta se enfoca en el contenido que es una alternativa a la integración de esquemas de datos.

Palabras clave: Integración de datos, información compartida, intercambio de información, bases de datos orientadas a objetos.

 

Abstract

This paper proposes an alternative to data integration from heterogeneous sources or databases. In some cases, inconsistencies may occur, and in others, the schema lacks of any attribute or entity to store the data. Some inconsistencies are consequence of using a language different with the one employed in the schema definition; others are due to the use of distinct units of measure. The object-oriented model provides characteristics that facilitate the class reuse and extension. The samples are obtained from heterogeneous Web sources belonging to the domain of computer equipment. Integration involves entities, attributes, values, and units of measurement.

Key words: Data integration, information sharing, information exchange, object oriented databases.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] G. Aslan and D. McLeod, "Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution," The VLDB Journal, vol. 8, no. 2, pp. 120-132, Oct. 1999.         [ Links ]

[2] M. Atay, et al., "Efficient schema-based XML-to-Relational data mapping," Information Systems, vol. 32, no. 3, pp. 458-476, May 2007.         [ Links ]

[3] G. Davies and L. Ekenberg, "Model correspondence as a basis for schema domination," Knowledge-Based Systems, vol. 23, no. 7, pp. 693-703, Oct. 2010.         [ Links ]

[4] R. C. Goldstein and V. C. Store, "Data abstractions: Why and how?," Data & Knowledge Engineering, vol. 29, no. 3, pp. 293-311, Mar. 1999.         [ Links ]

[5] R. Hull and R. King, "Semantic database modeling: survey, applications, and research issues," ACM Computing Surveys, vol. 19, no. 3, pp. 201-260, Sept. 1987.         [ Links ]

[6] R. Hull, "Managing Semantic Heterogeneity in Databases: A Theoretical Perspective," Proc. ACM Symposium on Principles of Database Systems (PODS'97), pp. 51-61, 1997.         [ Links ]

[7] B. Jeong, D. Lee, H. Cho and J. Lee, "A novel method for measuring semantic similarity for XML schema matching," Expert Systems with Applications, vol. 34, no. 3, pp. 1651-1658, Apr. 2008.         [ Links ]

[8] J. Kohler, et al., "Logical and Semantic Database Integration," Proc. 1st IEEE International Symposium on Bioinformatics and Biomedical Engineering (BIBE '00), pp. 77-80, 2000.         [ Links ]

[9] E.-P. Lim and R. H. L. Chiang, "Accommodating instance heterogeneities in database integration," Decision Support Systems, vol. 38, no. 2, pp. 213-231, Nov. 2004.         [ Links ]

[10] C. D. Manning, P. Raghavan and H. Schütze, An Introduction to Information Retrieval, Cambridge, MA: Cambridge University Press, 2009.         [ Links ]

[11] S. Madria, K. Passi and S. Bhowmick, "An XML Schema integration and query mechanism system," Data & Knowledge Engineering, vol. 65, no. 2, pp. 266-303, May 2008.         [ Links ]

[12] I. Mirbel, "Semantic integration of conceptual schemas," Data & Knowledge Engineering, vol. 21, no. 2, pp. 183-195, Jan. 1997.         [ Links ]

[13] M. L. Nguyen and A. Shimazu, "A semi supervised learning model for mapping sentences to logical forms with ambiguous supervision," Data & Knowledge Engineering, vol. 90, no. 1, pp. 1-12, Mar. 2014.         [ Links ]

[14] H.-Q. Nguyen, et al., "Double-layered schema integration of heterogeneous XML sources," The Journal of Systems and Software, vol. 84, no. 1, pp. 63-76, Jan. 2011.         [ Links ]

[15] H. Nottelmann and U. Straccia, "Information retrieval and machine learning for probabilistic schema matching," Information Processing and Management, vol. 43, no. 3, pp. 552-576, May 2007.         [ Links ]

[16] G. Della Penna, et al., "Interoperability mapping from XML schemas to ER diagrams," Data & Knowledge Engineering, vol. 59, no. 1, pp. 166-188, Oct. 2006.         [ Links ]

[17] G. Pirró, "A semantic similarity metric combining features and intrinsic information content," Data & Knowledge Engineering. Vol. 68, no. 11, pp. 1289-1308, Nov. 2009.         [ Links ]

[18] J.-L. Seng and I.L. Kong, "A schema and ontology-aided intelligent information integration," Expert Systems with Applications, vol. 36, no. 7, pp. 10538-10550, Sept. 2009.         [ Links ]

[19] R. dos Santos Mello, S. Castano and C. A. Heuser, "A method for the unification of XML schemata," Information and Software Technology, vol. 44, no. 4, pp. 241-249, Mar. 2002.         [ Links ]

[20] J. M. Smith and D. C. P. Smith, "Database Abstractions: Aggregation and Generalization," ACM Transactions on Database Systems, vol. 2, no. 2, pp. 105-133, June 1977.         [ Links ]

[21] Victor Vianu. "A Web Odyssey: from Codd to XML," Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of database systems (PODS '01), 1-15, 2001.         [ Links ]

[22] William Wei Song, Paul Johannesson, Janis A. Bubenko Jr. "Semantic similarity relations and computation in schema integration," Data & Knowledge Engineering, Vol. 19, no. 1, pp. 65-97, May 1996.         [ Links ]

[23] P. Coad and E. Yourdon, Object-Oriented Design, Yourdon Press, New Jersey, 1991.         [ Links ]

[24] G. Booch. Object Oriented Design with Applications, New York: Benjamin/Cummings, 1994.         [ Links ]

[25] H. Garcia-Molina, et al. "The TSIMMIS project: integration of heterogeneous information sources," Journal of Intelligent Information Systems, Vol. 8 no. 2, pp. 117-132, 1997.         [ Links ]

[26] M. Minsky, "A Framework for Representing Knowledge," MIT-AI Laboratory Memo 306, June, 1974.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License