<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462022000301323</article-id>
<article-id pub-id-type="doi">10.13053/cys-26-3-4353</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[A Feature-Rich Vietnamese Named Entity Recognition Model]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Nhat Minh]]></surname>
<given-names><![CDATA[Pham Quang]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Alt Vietnam Co.  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Vietnam</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2022</year>
</pub-date>
<volume>26</volume>
<numero>3</numero>
<fpage>1323</fpage>
<lpage>1331</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462022000301323&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462022000301323&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462022000301323&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: In this paper, we present a feature-based named entity recognition (NER) model that achieves the start-of-the-art accuracy for Vietnamese language. We combine word, word-shape features, PoS, chunk, Brown-cluster-based features, and word-embedding-based features in the Conditional Random Fields (CRF) model. We also explore the effects of word segmentation, PoS tagging, and chunking results of many popular Vietnamese NLP toolkits on the accuracy of the proposed feature-based NER model. Up to now, our work is the first work that systematically performs an extrinsic evaluation of basic Vietnamese NLP toolkits on the downstream NER task. Experimental results show that while automatically-generated word segmentation is useful, PoS and chunking information generated by Vietnamese NLP tools does not show their benefits for the proposed feature-based NER model.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Feature selection]]></kwd>
<kwd lng="en"><![CDATA[Vietnamese]]></kwd>
<kwd lng="en"><![CDATA[named entity recognition]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brown]]></surname>
<given-names><![CDATA[P. F.]]></given-names>
</name>
<name>
<surname><![CDATA[deSouza]]></surname>
<given-names><![CDATA[P. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Mercer]]></surname>
<given-names><![CDATA[R. L.]]></given-names>
</name>
<name>
<surname><![CDATA[Pietra]]></surname>
<given-names><![CDATA[V. J. D.]]></given-names>
</name>
<name>
<surname><![CDATA[Lai]]></surname>
<given-names><![CDATA[J. C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Class-based n-gram models of natural language]]></article-title>
<source><![CDATA[Comput. Linguist.]]></source>
<year>1992</year>
<volume>18</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>467-79</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chiu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Nichols]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Named entity recognition with bidirectional LSTM-CNNs]]></article-title>
<source><![CDATA[Transactions of the Association for Computational Linguistics]]></source>
<year>2016</year>
<volume>4</volume>
<page-range>357-70</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Florian]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Ittycheriah]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Jing]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Named entity recognition through classifier combination]]></source>
<year>2003</year>
<volume>4</volume>
<conf-name><![CDATA[ seventh conference on Natural language learning at HLT-NAACL]]></conf-name>
<conf-date>2003</conf-date>
<conf-loc> </conf-loc>
<page-range>168-71</page-range><publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Huyen]]></surname>
<given-names><![CDATA[N. T. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Luong]]></surname>
<given-names><![CDATA[V. X.]]></given-names>
</name>
</person-group>
<source><![CDATA[VLSP 2016 shared task: Named entity recognition]]></source>
<year>2016</year>
<conf-name><![CDATA[ Vietnamese Speech and Language Processing (VLSP)]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Koo]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Carreras]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Collins]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Simple semi-supervised dependency parsing]]></source>
<year>2008</year>
<conf-name><![CDATA[ ACL-08: HLT, Association for Computational Linguistics]]></conf-name>
<conf-loc>Columbus, Ohio </conf-loc>
<page-range>595-603</page-range></nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lafferty]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[McCallum]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Pereira]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<source><![CDATA[Conditional random fields: Probabilistic models for segmenting and labeling sequence data]]></source>
<year>2001</year>
<page-range>282-9</page-range><publisher-name><![CDATA[ICML]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[H. P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Vietnamese named entity recognition using token regular expressions and bidirectional inference]]></source>
<year>2016</year>
<volume>abs/1610.05652</volume>
<publisher-name><![CDATA[CoRR]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Le-Hong]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[Q. N. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[T. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Tran]]></surname>
<given-names><![CDATA[T. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[D. M.]]></given-names>
</name>
</person-group>
<source><![CDATA[An empirical study of discriminative sequence labeling models for vietnamese text processing]]></source>
<year>2017</year>
<conf-name><![CDATA[ 9th International Conference on Knowledge and Systems Engineering]]></conf-name>
<conf-loc> </conf-loc>
<page-range>88-93</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liang]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Semi-supervised learning for natural language]]></source>
<year>2005</year>
<publisher-name><![CDATA[Massachusetts Institute of Technology]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Miller]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Guinness]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Zamanian]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Susan Dumais]]></surname>
<given-names><![CDATA[D. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Roukos]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Name tagging with word clusters and discriminative training]]></source>
<year>2004</year>
<conf-name><![CDATA[ HLT-NAACL 2004: Main Proceedings]]></conf-name>
<conf-loc>Boston, Massachusetts, USA </conf-loc>
<page-range>337-42</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[D. Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[D. Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Vu]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Dras]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[A Fast and Accurate Vietnamese Word Segmenter]]></source>
<year>2018</year>
<conf-name><![CDATA[ 11th International Conference on Language Resources and Evaluation]]></conf-name>
<conf-date>2018</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[D. Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Vu]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[D. Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Dras]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[From Word Segmentation to POS Tagging for Vietnamese]]></source>
<year>2017</year>
<conf-name><![CDATA[ Australasian Language Technology Association Workshop]]></conf-name>
<conf-date>2017</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[T. P.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[A. C.]]></given-names>
</name>
</person-group>
<source><![CDATA[A hybrid approach to Vietnamese word segmentation]]></source>
<year>2016</year>
<conf-name><![CDATA[ Computing &amp; Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2016 IEEE RIVF International Conference on]]></conf-name>
<conf-loc> </conf-loc>
<page-range>114-9</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Okazaki]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[CRFsuite: A fast implementation of conditional random fields (CRFs)]]></source>
<year>2007</year>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pennington]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Socher]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C. D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Glove: Global vectors for word representation]]></source>
<year>2014</year>
<conf-name><![CDATA[ Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1532-43</page-range></nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Khoai]]></surname>
<given-names><![CDATA[P. X.]]></given-names>
</name>
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[T. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Le-Hong]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[NNVLP: A neural network-based Vietnamese language processing toolkit]]></source>
<year>2017</year>
<conf-name><![CDATA[ IJCNLP 2017, System Demonstrations]]></conf-name>
<conf-loc> </conf-loc>
<page-range>37-40</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[T. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[H. P.]]></given-names>
</name>
</person-group>
<source><![CDATA[The importance of automatic syntactic features in Vietnamese named entity recognition]]></source>
<year>2017</year>
<volume>abs/1705.10610</volume>
<publisher-name><![CDATA[CoRR]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sang]]></surname>
<given-names><![CDATA[E. F. T. K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Introduction to the conll-2002 shared task: Language-independent named entity recognition]]></source>
<year>2002</year>
<volume>cs.CL/0209010</volume>
<publisher-name><![CDATA[CoRR]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sang]]></surname>
<given-names><![CDATA[E. F. T. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Meulder]]></surname>
<given-names><![CDATA[F. D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Introduction to the conll-2003 shared task: Language-independent named entity recognition]]></source>
<year>2003</year>
<publisher-name><![CDATA[CoNLL]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sundheim]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of results of the muc-6 evaluation]]></source>
<year>1995</year>
<publisher-name><![CDATA[MUC]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Turian]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Ratinov]]></surname>
<given-names><![CDATA[L.-A.]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Word representations: A simple and general method for semi-supervised learning]]></source>
<year>2010</year>
<conf-name><![CDATA[ 48th Annual Meeting of the Association for Computational Linguistics]]></conf-name>
<conf-loc>Uppsala, Sweden </conf-loc>
<page-range>384-94</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
