SciELO - Scientific Electronic Library Online

 
vol.10 issue3Improved Golden-Section Algorithm for the Multi-Item Replenishment ProblemAn Improved Robust and Adaptive Watermarking Algorithm Based on DCT author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Journal of applied research and technology

On-line version ISSN 2448-6736Print version ISSN 1665-6423

J. appl. res. technol vol.10 n.3 Ciudad de México Dec. 2012

 

Automatic Building of an Ontology from a Corpus of Text Documents Using Data Mining Tools

 

J. I. Toledo-Alvarado*, A. Guzmán-Arenas, G. L. Martínez-Luna

 

Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN) Av. Juan de Dios Bátiz esquina con calle Miguel Othón de Mendizabal, 07738 México, D.F., México. E-mail: jitoledo@ipn.mx

 

ABSTRACT

In this paper we show a procedure to build automatically an ontology from a corpus of text documents without external help such as dictionaries or thesauri. The method proposed finds relevant concepts in the form of multi-words in the corpus and non-hierarchical relations between them in an unsupervised manner.

Keywords: Ontology learning, Data Mining, Machine Learning, Apriori algorithm

 

RESUMEN

En este artículo mostramos un procedimiento para construir automáticamente una ontología a partir de un corpus de documentos de texto sin ayuda externa tal como diccionarios o tesauros. El método propuesto encuentra conceptos relevantes en forma de frases temáticas en el corpus de documentos y relaciones no jerárquicas entre ellos de manera no supervisada.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

References

[1] Victoria Barbosa José Alfredo and Ávila Aoki Manuel. Patrones de crecimiento en la generación de información en discos duros, Revista Digital Universitaria, Vol. 10 No. 6, June, 2009.         [ Links ]

[2] Thomas R. Gruber, A translation approach to portable ontology specifications, Knowledge Acquisition - Special issue: Current issues in knowledge modeling, Vol. 5, No. 2, June, 1993, pp. 199-220.         [ Links ]

[3] Novak, J. D. & Cañas, A. J., The Theory Underlying Concept Maps and How to Construct and Use Them (2006-01 Rev 2008-01), Technical report, Florida Institute for Human and Machine Cognition, 2006.         [ Links ]

[4] Paul Buitelaar, Philipp Cimiano, Bernardo Magnini, Ontology Learning from Text: An Overview, in Ontology Learning from Text: Methods, Evaluation and Applications / Frontiers in Artificial Intelligence and Applications volume 123, Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Editors, IOS Press, 2005, pp. 1-10.         [ Links ]

[5] Cimiano, P.; Vólker, J. & Studer, R., 'Ontologies on Demand? -A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text Information', Information, Wissenschaft und Praxis, Vol. 57, No. 6-7, October, 2006, pp. 315-320.         [ Links ]

[6] Manning, C. D.; Raghavan, P. & Schütze, H., An Introduction to Information Retrieval, Cambridge University Press, 2008.         [ Links ]

[7] Kyo Kageura, B Atrice Daille, Hiroshi Nakagawa & Lee-Feng Chien, Recent advances in computational terminology, John Benjamins, 2004, pp. 1-22.         [ Links ]

[8] http://www.wordreference.com/definition/horse.         [ Links ]

[9] José Francisco Martínez Trinidad, Beatriz Beltrán Martínez, Adolfo Guzmán-Arenas & José Ruiz-Shulcloper, CLASITEX: A Tool for Knowledge Discovery from Texts, In Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '98), Jan M. Zytkow and Mohamed Quafafou Editors, 1998, pp. 459-467, Springer-Verlag, London, UK.         [ Links ]

[10] Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, MA, USA, 1999.         [ Links ]

[11] http://wordnet.princeton.edu/        [ Links ]

[12] http://www.wordreference.com/definition/concept        [ Links ]

[13] Philipp Cimiano, Andreas Hotho, and Steffen Staab, Learning concept hierarchies from text corpora using formal concept analysis, Journal of Artificial Intelligence Research, Vol. 24, No. 1, August, 2005, pp. 305-339.         [ Links ]

[14] David Faure & Claire Nedellec, Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM, In Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management (EKAW '99), Dieter Fensel and Rudi Studer Editors, Springer-Verlag, London, UK, 1999, pp. 329-334.         [ Links ]

[15] Marti A. Hearst, Automatic acquisition of hyponyms from large text corpora, In Proceedings of the 14th conference on Computational linguistics - Volume 2 (COLING '92), Vol. 2, 1992, pp. 539-545 Association for Computational Linguistics, Stroudsburg, PA, USA.         [ Links ]

[16] Buitelaar, P.; Olejnik, D. & Sintek, M., A Protege Plug-In for Ontology Extraction from Text Based on Linguistic Analysis, in 'The Semantic Web: Research and Applications', Christoph Bussler; John Davies; Dieter Fensel & Rudi Studer Editors, Springer, Berlin / Heidelberg, 2004, pp. 31-44.         [ Links ]

[17] Maedche, A. & Staab, S., Discovering Conceptual Relations from Text, in Proceedings of the 14th European Conference on Artificial Intelligence (ECAI), 2000, IOS Press, Amsterdam.         [ Links ]

[18] Dekang Lin and Patrick Pantel, DIRT @SBT@discovery of inference rules from text, In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '01), 2001, pp. 323-328, ACM, New York, NY, USA.         [ Links ]

[19] http://pdfbox.apache.org/        [ Links ]

[20] http://www.cs.utexas.edu/users/ear/cs378NLP/EnglishWordFrequencies.txt        [ Links ]

[21] http://snowball.tartarus.org/        [ Links ]

[22] Rakesh Agrawal and Ramakrishnan Srikant, Fast Algorithms for Mining Association Rules in Large Databases, In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB '94), 1994, pp 487-499, San Francisco, CA, USA.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License