SciELO - Scientific Electronic Library Online

 número43Keywords Identification within Greek URLsSemantic Aspect Retrieval for Encyclopedia índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados




Links relacionados

  • No hay artículos similaresSimilares en SciELO



versión On-line ISSN 1870-9044

Polibits  no.43 México ene./jun. 2011


Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search


Keisuke Yokoi1, Minh–Quoc Nghiem2, Yuichiroh Matsubayashi3, and Akiko Aizawa4


1 Department of Computer Science, University of Tokyo, Hongo 7–3–1, Bunkyo–ku, Tokyo, Japan (e–mail: kei–

2 Department of Informatics, The Graduate University for Advanced Studies, Tokyo, Japan (e–mail:

3 National Institute of Informatics, Tokyo, Japan (e–mail: y–

4 Department of Computer Science, University of Tokyo, Hongo 7–3–1, Bunkyo–ku, Tokyo, Japan and with National Institute of Informatics, Tokyo, Japan (e–mail:


Manuscript received November 12, 2010.
Manuscript accepted for publication January 10, 2011.



We found a way to use mathematical search to provide better navigation for reading papers on computers. Since the superficial information of mathematical expressions is ambiguous, considering not only mathematical expressions but also the texts around them is necessary. We present how to extract a natural language description, such as variable names or function definitions that refer to mathematical expressions with various experimental results. We first define an extraction task and constructed a reference dataset of 100 Japanese scientific papers by hand. We then propose the use of two methods, pattern matching and machine learning based ones for the extraction task. The effectiveness of the proposed methods is shown through experiments by using the reference set.

Key words: Natural language processing, mathematical expressions, pattern matching, machine learning.





[1] "Information Processing Society of Japan,"         [ Links ]

[2] World Wide Web Consortium, "Mathematical markup language (mathml) version 2.0 (second edition),"         [ Links ]

[3] "World Wide Web consortium (W3C),"         [ Links ]

[4] M. Suzuki, T. Kanahori, N. Ohtake, and K. Yamaguchi, "An integrated ocr software for mathematical documents and its output with accessibility," in Computers Helping People with Special Needs, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2004, vol. 3118, pp. 648–655.         [ Links ]

[5] R. Munavalli and R. Miner, "Mathfind: a math–aware search engine," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '06. New York, NY, USA: ACM, 2006, pp. 735–735. [Online]. Available:         [ Links ]

[6] J. Misutka, "Indexing mathematical content using full text search engine," in WDS' 08 Proceedings of Contributed Papers: Part I – Mathematics and Computer Sciences, 2008, pp. 240–244.         [ Links ]

[7] M. Adeel, H. S. Cheung, and S. H. Khiyal, "Math GO! prototype of a content based mathematical formula search engine," Journal of Theoretical and Applied Information Technology, vol. 4, no. 10, pp. 1002–1012, 2006.         [ Links ]

[8] K. Yokoi and A. Aizawa, "An approach to similarity search for mathematical expressions using MathML," in Towards digital mathematics library (DML), 2009, pp. 27–35.         [ Links ]

[9] M. Kohlhase and A. Franke, "Mbase: Representing knowledge and context for the integration of mathematical software systems," Journal of Symbolic Computation, vol. 32, no. 4, pp. 365–402, 2001.         [ Links ]

[10] S. Jeschke, M. Wilke, M. Blanke, N. Natho, and O. Pfeiffer, "Information extraction from mathematical texts by means of natural language processing techniques," in ACM Multimedia EMME Workshop, 2007, pp. 109–114.         [ Links ]

[11] T. Kudo, "Mecab: Yet another part–of–speech and morphological analyzer,"         [ Links ]

[12] S. S. Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal Estimated sub–GrAdient SOlver for SVM," in ICML '07: Proceedings of the 24th international conference on Machine learning. New York, NY, USA: ACM, 2007, pp. 807–814.         [ Links ]

[13] N. Okazaki, "Classias: a collection of machine–learning algorithms for classification," 2009. [Online]. Available:         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons