<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462017000200253</article-id>
<article-id pub-id-type="doi">10.13053/cys-21-2-2743</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[MathIRs: Retrieval System for Scientific Documents]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pathak]]></surname>
<given-names><![CDATA[Amarnath]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[Partha]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sarkar]]></surname>
<given-names><![CDATA[Sandip]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[Dipankar]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[Alexander]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,National Institute of Technology Mizoram  ]]></institution>
<addr-line><![CDATA[Aizawl ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Hijli College  ]]></institution>
<addr-line><![CDATA[Kharagpur ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Jadavpur University  ]]></institution>
<addr-line><![CDATA[Kolkata ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af4">
<institution><![CDATA[,Instituto Politécnico Nacional  ]]></institution>
<addr-line><![CDATA[Mexico City ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2017</year>
</pub-date>
<volume>21</volume>
<numero>2</numero>
<fpage>253</fpage>
<lpage>265</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462017000200253&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462017000200253&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462017000200253&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Effective retrieval of mathematical contents from vast corpus of scientific documents demands enhancement in the conventional indexing and searching mechanisms. Indexing mechanism and the choice of semantic similarity measures guide the results of Math Information Retrieval system (MathIRs) to perfection. Tokenization and formula unification are among the distinguishing features of indexing mechanism, used in MathIRs, which facilitate sub-formula and similarity search. Besides, the scientific documents and the user queries in MathIRs will contain math as well as text contents and to match these contents we require three important modules: Text-Text Similarity (TS), Math-Math Similarity (MS) and Text-Math Similarity (TMS). In this paper we have proposed MathIRs comprising these important modules and a substitution tree based mechanism for indexing mathematical expressions. We have also presented experimental results for similarity search and argued that proposal of MathIRs will ease the task of scientific document retrieval.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Natural language processing]]></kwd>
<kwd lng="en"><![CDATA[information retrieval]]></kwd>
<kwd lng="en"><![CDATA[MathIRs]]></kwd>
<kwd lng="en"><![CDATA[indexing]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Formánek]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Lí&#353;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[R&#365;&#382;i&#269;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Normalization of digital mathematics library content]]></source>
<year>2012</year>
<conf-name><![CDATA[ Proceedings of the Conference on Intelligent Computer Mathematics (CICM)]]></conf-name>
<conf-loc>Bremen, Germany </conf-loc>
<page-range>91&#8211;103</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Graf]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<source><![CDATA[Substitution tree indexing]]></source>
<year>1995</year>
<conf-name><![CDATA[ Proceedings of the International Conference on Rewriting Techniques and Applications]]></conf-name>
<conf-loc>Springer, Kaiserslautern, Germany </conf-loc>
<page-range>117&#8211;131</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kohlhase]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sucan]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[A search engine for mathematical formulae]]></source>
<year>2006</year>
<conf-name><![CDATA[ Proceedings of the International Conference on Artificial Intelligence and Symbolic Computation]]></conf-name>
<conf-loc>Springer, Beijing, China </conf-loc>
<page-range>241&#8211;253</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lavie]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments]]></source>
<year>2007</year>
<conf-name><![CDATA[ Proceedings of the Second Workshop on Statistical Machine Trans-lation]]></conf-name>
<conf-loc>Stroudsburg, PA, USA </conf-loc>
<page-range>228&#8211;231</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lí&#353;ka]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Mathematical indexing and querying]]></source>
<year>2010</year>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lynum]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Gambäck]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Jimenez]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[NTNU: measuring semantic similarity with sublexical feature representations and soft cardinality]]></source>
<year>2014</year>
<conf-name><![CDATA[ Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval)]]></conf-name>
<conf-loc>Dublin, Ireland </conf-loc>
<page-range>448&#8211;453</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Cor-rado]]></surname>
<given-names><![CDATA[G. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Distributed representations of words and phrases and their compositionality]]></article-title>
<source><![CDATA[Advances in neural information processing systems]]></source>
<year>2013</year>
<page-range>3111&#8211;3119</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Miner]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Munavalli]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[An approach to mathematical search through query formulation and data normalization]]></article-title>
<source><![CDATA[Towards Mechanized Mathematical Assistants]]></source>
<year>2007</year>
<page-range>342&#8211;355</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mi&#353;utka]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Galambo&#353;]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Extending full text search engine for mathematical content]]></article-title>
<source><![CDATA[Towards Digital Mathematics Library]]></source>
<year>2008</year>
<page-range>55&#8211;67</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Bandyopadhyay]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Textual entailment using lexical and syntactic similarity]]></article-title>
<source><![CDATA[International Journal of Artificial Intelligence and Applications]]></source>
<year>2011</year>
<volume>2</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>43&#8211;58</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[An architecture for scientific document retrieval using textual and math entailment modules]]></article-title>
<source><![CDATA[Recent Advances in Slavonic Natural Language Processing]]></source>
<year>2014</year>
<page-range>107&#8211;117</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pennington]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Socher]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C. D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Glove: Global vectors for word representation]]></article-title>
<source><![CDATA[EMNLP]]></source>
<year>2014</year>
<volume>14</volume>
<page-range>1532&#8211;1543</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[R&#365;&#382;i&#269;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Lí&#353;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Math indexer and searcher under the hood: History and development of a winning strategy]]></source>
<year>2014</year>
<conf-name><![CDATA[ Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies]]></conf-name>
<conf-loc>Tokyo, Japan </conf-loc>
<page-range>127&#8211;134</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[R&#365;&#382;i&#269;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[L&#305;&#353;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Math indexer and searcher under the hood: Fine-tuning query expansion and unification strategies]]></source>
<year>2016</year>
<conf-name><![CDATA[ Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies]]></conf-name>
<conf-loc>Tokyo, Japan </conf-loc>
<page-range>7&#8211;10</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sarkar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[JUNITMZ at SemEval-2016 task 1: Identifying semantic similarity using Levenshtein ratio]]></source>
<year>2016</year>
<conf-name><![CDATA[ Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval)]]></conf-name>
<conf-loc>San Diego, California </conf-loc>
<page-range>702&#8211;705</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sarkar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Regression based approaches for detecting and measuring textual similarity]]></source>
<year>2016</year>
<conf-name><![CDATA[ Mining Intelligence and Knowledge Exploration: 4th International Conference (MIKE)]]></conf-name>
<conf-loc>Mexico City </conf-loc>
<page-range>144&#8211;152</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sarkar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Saha]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Bentham]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[NLP-NITMZ@DPIL-FIRE2016: language independent paraphrases detection]]></source>
<year>2016</year>
<conf-name><![CDATA[ Shared task on detecting paraphrases in Indian languages (DPIL), Forum for Information Retrieval Evaluation (FIRE)]]></conf-name>
<conf-loc>Kolkata, India </conf-loc>
<page-range>256&#8211;259</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Schellenberg]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Layout-based substitution tree indexing and retrieval for mathematical expressions]]></source>
<year>2011</year>
</nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<source><![CDATA[Non-linear construction of n-grams in computational linguistics: syntactic, filtered, and generalized n-grams]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Should syntactic n-grams contain names of syntactic relations?]]></article-title>
<source><![CDATA[International Journal of Computational Linguistics and Applications]]></source>
<year>2014</year>
<volume>5</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>139&#8211;158</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A. F.]]></given-names>
</name>
<name>
<surname><![CDATA[Gómez-Adorno]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Pinto]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Soft similarity and soft cosine measure: Similarity of features in vector space model]]></article-title>
<source><![CDATA[Computación y Sistemas]]></source>
<year>2014</year>
<volume>18</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>491&#8211;504</page-range></nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Velasquez]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Stamatatos]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Chanona-Hernández]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Syntactic n-grams as machine learning features for natural language processing]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>2014</year>
<volume>41</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>853&#8211;860</page-range></nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Lí&#353;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[The art of mathematics retrieval]]></source>
<year>2011</year>
<conf-name><![CDATA[ Proceedings of the 11th ACM symposium on Document engineering]]></conf-name>
<conf-loc>Mountain View, California </conf-loc>
<page-range>57&#8211;60</page-range></nlm-citation>
</ref>
<ref id="B24">
<label>24</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Lí&#353;ka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Indexing and searching mathematics in digital libraries]]></source>
<year>2011</year>
<conf-name><![CDATA[ Proceedings of the International Conference on Intelligent Computer Mathematics]]></conf-name>
<conf-loc>Bertinoro, Italy </conf-loc>
<page-range>228&#8211;243</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
