<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462016000300355</article-id>
<article-id pub-id-type="doi">10.13053/cys-20-3-2455</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Tokenizer Adapted for the Nasa Yuwe Language]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sierra Martínez]]></surname>
<given-names><![CDATA[Luz Marina]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Cobos Lozada]]></surname>
<given-names><![CDATA[Carlos Alberto]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Corrales]]></surname>
<given-names><![CDATA[Juan Carlos]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,University of Cauca  ]]></institution>
<addr-line><![CDATA[Popayán ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,University of Cauca  ]]></institution>
<addr-line><![CDATA[Popayán ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2016</year>
</pub-date>
<volume>20</volume>
<numero>3</numero>
<fpage>355</fpage>
<lpage>364</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462016000300355&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462016000300355&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462016000300355&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract. In Colombia, ethnic and cultural diversity is conceived by the government to be a social right. Such diversity finds expression, among other ways, in a large number of indigenous languages, which have been kept alive for centuries. However, efforts toward conservation and preservation of these languages have generally fallen short. This is the case for the Nasa Yuwe language, spoken by the Nasa, or Páez, indigenous community, the status of which is endangered. Given such a predicament, the use of technology has been found to provide a strategic opportunity for adaptation, ownership, and development of Nasa Yuwe within the social and cultural environment of the Nasa people. The technology includes the use of computational techniques, which allow the exchange of information by means of IR activities. These encourage different, new possibilities for the Nasa people to be able to interact in Nasa Yuwe. It has therefore become necessary to adapt the stages of the IR process to this language. The current paper specifically presents a process for adapting a tokenizer to texts written in Nasa Yuwe. This involves making use of the precision-recall curve as an evaluation and comparison measure. The results presented allow appreciation of all stages in the process of adapting the standard tokenizer to produce the Nasa version, of the Nasa tokenizer and its results over texts written in Nasa Yuwe, and of the analysis of the precision-recall curve baseline in contrast to that of the Nasa tokenizer.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Nasa indigenous community]]></kwd>
<kwd lng="en"><![CDATA[Nasa Yuwe language]]></kwd>
<kwd lng="en"><![CDATA[tokenizer for Nasa Yuwe]]></kwd>
<kwd lng="en"><![CDATA[information retrieval for texts written in Nasa Yuwe]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="">
<collab>National Constitutional Assembly of the Republic of Colombia, Bank of the Republic</collab>
<source><![CDATA[]]></source>
<year>1990</year>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="book">
<collab>University of Cauca</collab>
<collab>CRIC</collab>
<collab>PEBI</collab>
<collab>General Language Commission</collab>
<source><![CDATA[Sociolinguistic study in preliminary phase]]></source>
<year>2008</year>
<publisher-loc><![CDATA[Popayán ]]></publisher-loc>
<publisher-name><![CDATA[Nasa Yuwe and Namtrik language]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Challenges in the Interaction of Information Retrieval and Natural Language Processing]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Baeza-Yates]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[Computational Linguistics and Intelligent Text Processing]]></source>
<year>2004</year>
<volume>2945</volume>
<page-range>445-56</page-range><publisher-loc><![CDATA[Berlin ]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<article-title xml:lang=""><![CDATA[Esbozo Gramatical de la lengua Nasa (lengua Paéz)]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rojas]]></surname>
<given-names><![CDATA[T.E.]]></given-names>
</name>
</person-group>
<source><![CDATA[El Lenguaje en Colombia. Tomo I: Realidad Lingüística de Colombia]]></source>
<year>2012</year>
<publisher-loc><![CDATA[Bogotá ]]></publisher-loc>
<publisher-name><![CDATA[UNICEF]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Raghavan]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Shütze]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[An Introduction to Information Retrieval]]></source>
<year>2009</year>
<publisher-name><![CDATA[Cambridge University Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Light Stemming for Arabic Information Retrieval]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Larkey]]></surname>
<given-names><![CDATA[L.S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ballesteros]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Connell]]></surname>
<given-names><![CDATA[M.E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Arabic Computational Morphology Text, Speech and Language Technology]]></source>
<year>2007</year>
<volume>38</volume>
<page-range>221-43</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tolosa]]></surname>
<given-names><![CDATA[G.H.]]></given-names>
</name>
<name>
<surname><![CDATA[Bordignon]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<source><![CDATA[Introducción a la Recuperación de Información]]></source>
<year>2005</year>
<publisher-loc><![CDATA[Argentina ]]></publisher-loc>
<publisher-name><![CDATA[Universidad Nacional de Luján]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="book">
<article-title xml:lang=""><![CDATA[Within-Language Information Retrieval]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peters]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Braschler]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Clough]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Multilingual Information Retrieval]]></source>
<year>2012</year>
<page-range>17-55</page-range><publisher-loc><![CDATA[Berlin ]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhan]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Research and Implementation of Full-Text Retrieval]]></source>
<year>2012</year>
<volume>181</volume>
<conf-name><![CDATA[ 2012 International Conference on Communication, Electronics and Automation Engineering]]></conf-name>
<conf-loc>Berlin Heidelberg </conf-loc>
<page-range>349-56</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Borges]]></surname>
<given-names><![CDATA[E.N.]]></given-names>
</name>
<name>
<surname><![CDATA[Pereira]]></surname>
<given-names><![CDATA[I.A.]]></given-names>
</name>
<name>
<surname><![CDATA[Tomas]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[ARGOSearch: an Information Retrieval System based on text similarity and extensible relevance criteria]]></source>
<year>2012</year>
<conf-name><![CDATA[ 31st International Conference of the Chilean Computer Science Society]]></conf-name>
<conf-loc>Valparaiso </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Research of Information Search Engine in Forestry Based on the Lucene]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cui]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Advances in Automation and Robotics, LNEE]]></source>
<year>2011</year>
<volume>2</volume>
<page-range>603-9</page-range><publisher-loc><![CDATA[Berlin ]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Monolingual Document Retrieval for European Languages]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hollink]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Kamps]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Monz]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[De Rijke]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[J. Information Retrieval]]></source>
<year>2004</year>
<volume>7</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>33-52</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Methods and algorithms for automatic text analysis]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yatso]]></surname>
<given-names><![CDATA[V.A.]]></given-names>
</name>
</person-group>
<source><![CDATA[J. Automatic Documentation and Mathematical Linguistics]]></source>
<year>2011</year>
<volume>45</volume>
<numero>5</numero>
<issue>5</issue>
<page-range>224-31</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sproat]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Linguistic Processing for Speech Synthesis]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Benesty]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Sondhi]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Springer Handbook of Speech Processing]]></source>
<year>2008</year>
<page-range>457-70</page-range><publisher-loc><![CDATA[Berlin ]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[You Don't Have to Think Twice if You Carefully Tokenize]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Klatt]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Bohnet]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Natural Language Processing - IJCNLP]]></source>
<year>2004</year>
<volume>3248</volume>
<page-range>299-309</page-range><publisher-loc><![CDATA[Berlin ]]></publisher-loc>
<publisher-name><![CDATA[LNCS]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Exploring and exploiting a historical corpus for Arabic]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hammo]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Yagi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ismail]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[AbuShariah]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Language Resources and Evaluation]]></source>
<year>2015</year>
<page-range>1-23</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jamil]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Jamaludin]]></surname>
<given-names><![CDATA[N.A.]]></given-names>
</name>
<name>
<surname><![CDATA[Abdul Rahman]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Sabari]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Implementation of Vector-Space Online Document Retrieval System Using Open Source Technology]]></source>
<year>2011</year>
<conf-name><![CDATA[ Conference on Open Systems (ICOS)]]></conf-name>
<conf-loc>Langkawi </conf-loc>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Lessons from building a Persian written corpus Peykare]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bijankhan]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sheykhzadegan]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Bahrani]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Ghayoomi]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Language Resources and Evaluation]]></source>
<year>2011</year>
<volume>45</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>143-64</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[An empirical study of tokenization strategies]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhai]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Information Retrieval]]></source>
<year>2007</year>
<volume>10</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>341-63</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<collab>Instituto Colombiano de Cultura Hispánica, Geografía Humana de Colombia, Región Andina Central</collab>
<source><![CDATA[]]></source>
<year>2000</year>
<publisher-loc><![CDATA[Bogotá ]]></publisher-loc>
<publisher-name><![CDATA[Banco de la República]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rojas]]></surname>
<given-names><![CDATA[T.E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Lengua Páez, Una visión de su gramática]]></source>
<year>1998</year>
<publisher-loc><![CDATA[Bogotá ]]></publisher-loc>
<publisher-name><![CDATA[Ministerio de Cultura]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jung]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Gramática del Páez o Nasa Yuwe. Descripción de una Lengua Indígena de Colombia]]></source>
<year>1984</year>
<publisher-name><![CDATA[LINOM GmbH]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="">
<collab>CRIC</collab>
<collab>Programa de Desarrollo Rural en la Región de Tierradentro Cxhab Wala</collab>
<collab>PT/CW</collab>
<source><![CDATA[Diccionario Nasa Yuwe - Castellano]]></source>
<year>2005</year>
<publisher-loc><![CDATA[Popayán ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rojas]]></surname>
<given-names><![CDATA[T.C.]]></given-names>
</name>
<name>
<surname><![CDATA[Perdomo Dizú]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrales]]></surname>
<given-names><![CDATA[M.H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Una Mirada al Nasa Yuwe de Novirao]]></source>
<year>2009</year>
<publisher-loc><![CDATA[Popayán ]]></publisher-loc>
<publisher-name><![CDATA[Sello Editorial Universidad del Cauca]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="journal">
<article-title xml:lang=""><![CDATA[Building a Nasa Yuwe Test Collection]]></article-title>
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sierra Martínez]]></surname>
<given-names><![CDATA[L.M.]]></given-names>
</name>
<name>
<surname><![CDATA[Cobos Lozada]]></surname>
<given-names><![CDATA[C.A.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrales]]></surname>
<given-names><![CDATA[J.C.]]></given-names>
</name>
<name>
<surname><![CDATA[Rojas Curieux]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Computational Linguistics and Intelligent Text Processing]]></source>
<year>2015</year>
<volume>9041</volume>
<page-range>112-23</page-range><publisher-loc><![CDATA[El Cairo, Egipt ]]></publisher-loc>
<publisher-name><![CDATA[LNCS]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tan]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Lucene tutorial.com]]></source>
<year>2015</year>
</nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="">
<collab>Lucene Apache</collab>
<source><![CDATA[]]></source>
<year>2016</year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
