versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Contextual Analysis of Mathematical Expressions for Advanced Mathematical Search
Keisuke Yokoi1, MinhQuoc Nghiem2, Yuichiroh Matsubayashi3, and Akiko Aizawa4
1 Department of Computer Science, University of Tokyo, Hongo 731, Bunkyoku, Tokyo, Japan (email: firstname.lastname@example.org).
2 Department of Informatics, The Graduate University for Advanced Studies, Tokyo, Japan (email: email@example.com).
3 National Institute of Informatics, Tokyo, Japan (email: firstname.lastname@example.org).
4 Department of Computer Science, University of Tokyo, Hongo 731, Bunkyoku, Tokyo, Japan and with National Institute of Informatics, Tokyo, Japan (email: email@example.com).
Manuscript received November 12, 2010.
Manuscript accepted for publication January 10, 2011.
We found a way to use mathematical search to provide better navigation for reading papers on computers. Since the superficial information of mathematical expressions is ambiguous, considering not only mathematical expressions but also the texts around them is necessary. We present how to extract a natural language description, such as variable names or function definitions that refer to mathematical expressions with various experimental results. We first define an extraction task and constructed a reference dataset of 100 Japanese scientific papers by hand. We then propose the use of two methods, pattern matching and machine learning based ones for the extraction task. The effectiveness of the proposed methods is shown through experiments by using the reference set.
Key words: Natural language processing, mathematical expressions, pattern matching, machine learning.
 M. Suzuki, T. Kanahori, N. Ohtake, and K. Yamaguchi, "An integrated ocr software for mathematical documents and its output with accessibility," in Computers Helping People with Special Needs, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2004, vol. 3118, pp. 648655. [ Links ]
 R. Munavalli and R. Miner, "Mathfind: a mathaware search engine," in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR '06. New York, NY, USA: ACM, 2006, pp. 735735. [Online]. Available: http://doi.acm.org/10.1145/1148170.1148348. [ Links ]
 J. Misutka, "Indexing mathematical content using full text search engine," in WDS' 08 Proceedings of Contributed Papers: Part I Mathematics and Computer Sciences, 2008, pp. 240244. [ Links ]
 M. Adeel, H. S. Cheung, and S. H. Khiyal, "Math GO! prototype of a content based mathematical formula search engine," Journal of Theoretical and Applied Information Technology, vol. 4, no. 10, pp. 10021012, 2006. [ Links ]
 K. Yokoi and A. Aizawa, "An approach to similarity search for mathematical expressions using MathML," in Towards digital mathematics library (DML), 2009, pp. 2735. [ Links ]
 M. Kohlhase and A. Franke, "Mbase: Representing knowledge and context for the integration of mathematical software systems," Journal of Symbolic Computation, vol. 32, no. 4, pp. 365402, 2001. [ Links ]
 S. Jeschke, M. Wilke, M. Blanke, N. Natho, and O. Pfeiffer, "Information extraction from mathematical texts by means of natural language processing techniques," in ACM Multimedia EMME Workshop, 2007, pp. 109114. [ Links ]
 S. S. Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal Estimated subGrAdient SOlver for SVM," in ICML '07: Proceedings of the 24th international conference on Machine learning. New York, NY, USA: ACM, 2007, pp. 807814. [ Links ]