<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462024000402005</article-id>
<article-id pub-id-type="doi">10.13053/cys-28-4-5225</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Word Embeddings: A Comprehensive Survey]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pak]]></surname>
<given-names><![CDATA[Alexandr]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ziyaden]]></surname>
<given-names><![CDATA[Atabay]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Saparov]]></surname>
<given-names><![CDATA[Timur]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Akhmetov]]></surname>
<given-names><![CDATA[Iskander]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[Alexander]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Institute of Informational and Computational Technologies  ]]></institution>
<addr-line><![CDATA[Almaty ]]></addr-line>
<country>Kazakhstan</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Kazakh-British Technical University  ]]></institution>
<addr-line><![CDATA[Almaty ]]></addr-line>
<country>Kazakhstan</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Instituto Politécnico Nacional Centro de Investigación en Computación ]]></institution>
<addr-line><![CDATA[Mexico City ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<volume>28</volume>
<numero>4</numero>
<fpage>2005</fpage>
<lpage>2029</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462024000402005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462024000402005&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462024000402005&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: This article is a systematic review of available studies in the area of word embeddings with an emphasis on classical matrix factorization techniques and contemporary neural word embedding algorithms such as Word2Vec, GloVe, and Bert. The efficiency and effectiveness of these methods for mapping semantic and lexical relationships are evaluated in greater detail providing analysis of the topology of these techniques. In addition, this approach demonstrates a model accuracy of 77%, which is 3% below the best human performance. At the same time the study has also shown the weaknesses of some models such as BERT, which lead to unrealistic high accuracy due to spurious correlations in the datasets. We see that there are three bottlenecks for the subsequent development of NLP algorithms: assimilation of inductive bias, common sense embedding, and generalization problem. The outcomes from this research help in enhancing the strength and applicability of word embeddings in natural language processing tasks.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[anguage models]]></kwd>
<kwd lng="en"><![CDATA[distributive semantics]]></kwd>
<kwd lng="en"><![CDATA[word embeddings]]></kwd>
<kwd lng="en"><![CDATA[natural language processing]]></kwd>
<kwd lng="en"><![CDATA[deep learning]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Belinkov]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Bisk]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Synthetic and natural noise both break neural machine translation]]></source>
<year>2017</year>
<conf-name><![CDATA[ International Conference on Learning Representations]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-13</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Beltagy]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Peters]]></surname>
<given-names><![CDATA[M. E.]]></given-names>
</name>
<name>
<surname><![CDATA[Cohan]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Longformer: The long-document transformer]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bender]]></surname>
<given-names><![CDATA[E. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Gebru]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[McMillan-Major]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Shmitchell]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[On the dangers of stochastic parrots: Can language models be too big?]]></source>
<year>2021</year>
<conf-name><![CDATA[ Conference on Fairness, Accountability, and Transparency]]></conf-name>
<conf-loc> </conf-loc>
<page-range>610-23</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Ducharme]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Vincent]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Jauvin]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A neural probabilistic language model]]></article-title>
<source><![CDATA[The Journal of Machine Learning Research]]></source>
<year>2003</year>
<volume>3</volume>
<page-range>1137-55</page-range></nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Blei]]></surname>
<given-names><![CDATA[D. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Ng]]></surname>
<given-names><![CDATA[A. Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Jordan]]></surname>
<given-names><![CDATA[M. I.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Latent dirichlet allocation]]></article-title>
<source><![CDATA[The Journal of Machine Learning Research]]></source>
<year>2003</year>
<volume>3</volume>
<page-range>993-1022</page-range></nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bojanowski]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Grave]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Joulin]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Enriching word vectors with subword information]]></article-title>
<source><![CDATA[Transactions of the Association for Computational Linguistics]]></source>
<year>2017</year>
<volume>5</volume>
<page-range>135-46</page-range></nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brown]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Mann]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Ryder]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Subbiah]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaplan]]></surname>
<given-names><![CDATA[J. D.]]></given-names>
</name>
<name>
<surname><![CDATA[Dhariwal]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Neelakantan]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Shyam]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Sastry]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Askell]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Herbert-Voss]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Krueger]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Henighan]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Child]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Ramesh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ziegler]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Winter]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Hesse]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Language models are few-shot learners]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2020</year>
<volume>33</volume>
<page-range>1877-901</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[van-Merriënboer]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Gulcehre]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Bahdanau]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Bougares]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Schwenk]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning phrase representations using RNN encoder&#8211;decoder for statistical machine translation]]></source>
<year>2014</year>
<conf-name><![CDATA[ Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1724-34</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[van Merrienboer]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Gülçehre]]></surname>
<given-names><![CDATA[Ç.]]></given-names>
</name>
<name>
<surname><![CDATA[Bougares]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Schwenk]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Learning phrase representations using RNN encoder-decoder for statistical machine translation]]></article-title>
<source><![CDATA[CoRR]]></source>
<year>2014</year>
<volume>abs/1406.1078</volume>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Clark]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Luong]]></surname>
<given-names><![CDATA[M. T.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[Q. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C. D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Electra: Pre-training text encoders as discriminators rather than generators]]></source>
<year>2020</year>
<conf-name><![CDATA[ International Conference on Learning Representations]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Devlin]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Chang]]></surname>
<given-names><![CDATA[M. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Toutanova]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[BERT: Pre-training of deep bidirectional transformers for language understanding]]></source>
<year>2019</year>
<volume>1</volume>
<conf-name><![CDATA[ Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies]]></conf-name>
<conf-loc> </conf-loc>
<page-range>4171-86</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Eckart]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Young]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The approximation of one matrix by another of lower rank]]></article-title>
<source><![CDATA[Psychometrika]]></source>
<year>1936</year>
<volume>1</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>211-8</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Firth]]></surname>
<given-names><![CDATA[J. R.]]></given-names>
</name>
</person-group>
<source><![CDATA[A synopsis of linguistic theory, 1930-55: Studies in linguistic analysis]]></source>
<year>1957</year>
<publisher-name><![CDATA[Blackwell]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gomez]]></surname>
<given-names><![CDATA[A. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Ren]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Urtasun]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Grosse]]></surname>
<given-names><![CDATA[R. B.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The reversible residual network: Backpropagation without storing activations]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2017</year>
<volume>30</volume>
<page-range>1-11</page-range><publisher-name><![CDATA[Curran Associates, Inc]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Habernal]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Wachsmuth]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Gurevych]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Stein]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[SemEval-2018 task 12: The argument reasoning comprehension task]]></source>
<year>2018</year>
<conf-name><![CDATA[ 12th International Workshop on Semantic Evaluation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>763-72</page-range></nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hochreiter]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Schmidhuber]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Long short-term memory]]></article-title>
<source><![CDATA[Neural Computation]]></source>
<year>1997</year>
<volume>9</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>1735-80</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hofmann]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Probabilistic latent semantic analysis]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Idel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Gematria and prognostication]]></source>
<year>2020</year>
<page-range>785-7</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Iyyer]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Wieting]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Gimpel]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Zettlemoyer]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Adversarial example generation with syntactically controlled paraphrase networks]]></source>
<year>2018</year>
<volume>1</volume>
<page-range>1875-85</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jia]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Liang]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Adversarial examples for evaluating reading comprehension systems]]></source>
<year>2017</year>
<conf-name><![CDATA[ Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>2021-31</page-range></nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Joulin]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Grave]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojanowski]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Douze]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Jégou]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Fasttext.zip: Compressing text classification models]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Joulin]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Grave]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojanowski]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Bag of tricks for efficient text classification]]></source>
<year>2017</year>
<volume>2</volume>
<page-range>427-31</page-range></nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kitaev]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaiser]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Levskaya]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Reformer: The efficient transformer]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrado]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Efficient estimation of word representations in vector space]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrado]]></surname>
<given-names><![CDATA[G. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Distributed representations of words and phrases and their compositionality]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2013</year>
<volume>26</volume>
<page-range>3111-9</page-range></nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Niven]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Kao]]></surname>
<given-names><![CDATA[H. Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Probing neural network comprehension of natural language arguments]]></source>
<year>2019</year>
<conf-name><![CDATA[ 57th Annual Meeting of the Association for Computational Linguistics]]></conf-name>
<conf-loc> </conf-loc>
<page-range>4658-64</page-range></nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pennington]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Socher]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Glove: Global vectors for word representation]]></source>
<year>2014</year>
<conf-name><![CDATA[ 2014 Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1532-43</page-range></nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peters]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Neumann]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Iyyer]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Gardner]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Clark]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Zettlemoyer]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Deep contextualized word representations]]></source>
<year>2018</year>
<volume>1</volume>
<conf-name><![CDATA[ Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peters]]></surname>
<given-names><![CDATA[M. E.]]></given-names>
</name>
<name>
<surname><![CDATA[Neumann]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zettlemoyer]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Yih]]></surname>
<given-names><![CDATA[W. T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Dissecting contextual word embeddings: Architecture and representation]]></source>
<year>2018</year>
<conf-name><![CDATA[ Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1499-509</page-range></nlm-citation>
</ref>
<ref id="B30">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Radford]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[J. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Hallacy]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Ramesh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Goh]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sastry]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Askell]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Mishkin]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Clark]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Krueger]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning transferable visual models from natural language supervision]]></source>
<year>2021</year>
<volume>139</volume>
<conf-name><![CDATA[ 38th International Conference on Machine Learning]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-16</page-range></nlm-citation>
</ref>
<ref id="B31">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Radford]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Narasimhan]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Improving language understanding by generative pre-training]]></source>
<year>2018</year>
</nlm-citation>
</ref>
<ref id="B32">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Radford]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Child]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Luan]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Amodei]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Language models are unsupervised multitask learners]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B33">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Röder]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Both]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Hinneburg]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Exploring the space of topic coherence measures]]></source>
<year>2015</year>
<conf-name><![CDATA[ 8th ACM International Conference on Web Search and Data Mining]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1499-509</page-range></nlm-citation>
</ref>
<ref id="B34">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sanh]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Debut]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Chaumond]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Wolf]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B35">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sanzhar]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Pak]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Bulatovna]]></surname>
<given-names><![CDATA[J. A.]]></given-names>
</name>
</person-group>
<source><![CDATA[The estimation of stability of semantic space generated by word embedding algorithms]]></source>
<year>2018</year>
<conf-name><![CDATA[ International Joint Symposium on Artificial Intelligence and Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-5</page-range></nlm-citation>
</ref>
<ref id="B36">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tay]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Bahri]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Metzler]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Juan]]></surname>
<given-names><![CDATA[D. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Zheng]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Synthesizer: Rethinking self-attention for transformer models]]></source>
<year>2021</year>
<volume>139</volume>
<conf-name><![CDATA[ 38th International Conference on Machine Learning]]></conf-name>
<conf-loc> </conf-loc>
<page-range>10183-92</page-range></nlm-citation>
</ref>
<ref id="B37">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vaswani]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Shazeer]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Parmar]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Uszkoreit]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Gomez]]></surname>
<given-names><![CDATA[A. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaiser]]></surname>
<given-names><![CDATA[&#321;.]]></given-names>
</name>
<name>
<surname><![CDATA[Polosukhin]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Attention is all you need]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2017</year>
<page-range>5998-6008</page-range></nlm-citation>
</ref>
<ref id="B38">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[B. Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Khabsa]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Fang]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Linformer: Self-attention with linear complexity]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B39">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Winata]]></surname>
<given-names><![CDATA[G. I.]]></given-names>
</name>
<name>
<surname><![CDATA[Madotto]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Lin]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Yosinski]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Fung]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Language models are few-shot multilingual learners]]></source>
<year>2021</year>
<conf-name><![CDATA[ 1st Workshop on Multilingual Representation Learning]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-15</page-range></nlm-citation>
</ref>
<ref id="B40">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Dai]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Carbonell]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Salakhutdinov]]></surname>
<given-names><![CDATA[R. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[Q. V.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Xlnet: Generalized autoregressive pretraining for language understanding]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2019</year>
<page-range>5753-63</page-range></nlm-citation>
</ref>
<ref id="B41">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zaheer]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Guruganesh]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Dubey]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ainslie]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Alberti]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Ontanon]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Ravula]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Ahmed]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Big bird: Transformers for longer sequences]]></source>
<year>2020</year>
<conf-name><![CDATA[ 34th International Conference on Neural Information Processing Systems]]></conf-name>
<conf-loc> </conf-loc>
<page-range>17283-97</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
