<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462019000300923</article-id>
<article-id pub-id-type="doi">10.13053/cys-23-3-3265</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pham]]></surname>
<given-names><![CDATA[Thuong-Hai]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Machá&#269;ek]]></surname>
<given-names><![CDATA[Dominik]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[Ond&#345;ej]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics]]></institution>
<addr-line><![CDATA[Prague ]]></addr-line>
<country>Czech Republic (the)</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2019</year>
</pub-date>
<volume>23</volume>
<numero>3</numero>
<fpage>923</fpage>
<lpage>934</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462019000300923&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462019000300923&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462019000300923&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We focus on the state-of-the-art Transformer model and use comparably larger corpora. Specifically, we try to promote the knowledge of source-side syntax using multi-task learning either through simple data manipulation techniques or through a dedicated model component. In particular, we train one of Transformer attention heads to produce source-side dependency tree. Overall, our results cast some doubt on the utility of multi-task setups with linguistic information. The data manipulation techniques, recommended in previous works, prove ineffective in large data settings. The treatment of self-attention as dependencies seems much more promising: it helps in translation and reveals that Transformer model can very easily grasp the syntactic structure. An important but curious result is, however, that identical gains are obtained by using trivial "linear trees" instead of true dependencies. The reason for the gain thus may not be coming from the added linguistic knowledge but from some simpler regularizing effect we induced on self-attention matrices.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Syntax]]></kwd>
<kwd lng="en"><![CDATA[Transformer NMT]]></kwd>
<kwd lng="en"><![CDATA[Multi-Task NMT]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aharoni]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Goldberg]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Towards string-to-tree neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the 55th ACL Meeting, Volume 2: Short Papers]]></source>
<year>2017</year>
<page-range>132-40</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bahdanau]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Neural machine translation by jointly learning to align and translate]]></article-title>
<source><![CDATA[Proceedings of ICLR]]></source>
<year>2015</year>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Barrault]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Costa-jussa]]></surname>
<given-names><![CDATA[M. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Federmann]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Fishel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Graham]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Haddow]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Huck]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Malmasi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Monz]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Muller]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Pal]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Post]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zampieri]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Findings of the 2019 Conference on Machine Translation (WMT19)]]></article-title>
<source><![CDATA[Proceedings of the Fourth Conference on Machine Translation]]></source>
<year>2019</year>
<publisher-loc><![CDATA[Florence, Italy ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Du&#353;ek]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Kocmi]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Libovický]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Novák]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Popel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sudarikov]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Vari&#353;]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Sojka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Horák]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kope&#269;ek]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Pala]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Text, Speech, and Dialogue: 19th International Conference, TSD 2016]]></source>
<year>2016</year>
<numero>9924</numero>
<issue>9924</issue>
<page-range>231-8</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Federmann]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Fishel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Graham]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Haddow]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Huck]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Monz]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Findings of the 2018 conference on machine translation (wmt18)]]></article-title>
<source><![CDATA[Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers]]></source>
<year>2018</year>
<page-range>272-307</page-range><publisher-loc><![CDATA[Belgium, Brussels ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[van Merrienboer]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Gulcehre]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Bahdanau]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Bougares]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Schwenk]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Learning phrase representations using rnn encoder-decoder for statistical machine translation]]></article-title>
<source><![CDATA[Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)]]></source>
<year>2014</year>
<page-range>1724-34</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dozat]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Manning]]></surname>
<given-names><![CDATA[C. D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Deep biaffine attention for neural dependency parsing]]></article-title>
<source><![CDATA[CoRR]]></source>
<year>2016</year>
<volume>abs/1611.01734</volume>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dyer]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Kuncoro]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ballesteros]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Smith]]></surname>
<given-names><![CDATA[N. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Recurrent neural network grammars]]></article-title>
<source><![CDATA[HLT-NAACL]]></source>
<year>2016</year>
<page-range>199-209</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Eriguchi]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Tsuruoka]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Learning to parse and translate improves neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the 55th ACL Meeting, Volume 2: Short Papers]]></source>
<year>2017</year>
<page-range>72-8</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ha]]></surname>
<given-names><![CDATA[T.-L.]]></given-names>
</name>
<name>
<surname><![CDATA[Niehues]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Waibel]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder]]></article-title>
<source><![CDATA[Proceedings of the International Workshop on Spoken Language Translation]]></source>
<year>2016</year>
<publisher-loc><![CDATA[Seattle, USA ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Haji&#269;]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Panevová]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Haji&#269;ová]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Sgall]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Pajas]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[&#352;t&#277;pánek]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Havelka]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Mikulová]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[&#381;abokrtský]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[&#352;ev&#269;iková Razímová]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Prague Dependency Treebank 2.0.]]></source>
<year>2006</year>
</nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Schuster]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[Q. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Krikun]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Thorat]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Viegas]]></surname>
<given-names><![CDATA[F. B.]]></given-names>
</name>
<name>
<surname><![CDATA[Wattenberg]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrado]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Hughes]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Google's multilingual neural machine translation system: Enabling zero-shot translation]]></article-title>
<source><![CDATA[CoRR]]></source>
<year>2016</year>
<volume>abs/1611.04558</volume>
</nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kiperwasser]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Ballesteros]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Scheduled Multi-Task Learning: From Syntax to Translation]]></article-title>
<source><![CDATA[Transactions of the Association for Computational Linguistics]]></source>
<year>2018</year>
<volume>6</volume>
<page-range>225-40</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Klejch]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Avramidis]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Burchardt]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Popel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[MT-ComparEval: Graphical evaluation interface for Machine Translation development]]></article-title>
<source><![CDATA[The Prague Bulletin of Mathematical Linguistics]]></source>
<year>2015</year>
<volume>104</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>63-74</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Statistical significance tests for machine translation evaluation]]></article-title>
<source><![CDATA[Proceedings of EMNLP]]></source>
<year>2004</year>
<volume>4</volume>
<page-range>388-95</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Europarl: A Parallel Corpus for Statistical Machine Translation]]></article-title>
<source><![CDATA[Conference Proceedings: the tenth Machine Translation Summit]]></source>
<year>2005</year>
<publisher-loc><![CDATA[Phuket, Thailand ]]></publisher-loc>
<publisher-name><![CDATA[AAMT]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Knowles]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Six challenges for neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the First Workshop on Neural Machine Translation]]></source>
<year>2017</year>
<page-range>28-39</page-range><publisher-loc><![CDATA[Vancouver ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kondratyuk]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Cardenas]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Replacing Linguists with Dummies: A Serious Need for Trivial Baselines in Multi-Task Neural Machine Translation]]></article-title>
<source><![CDATA[The Prague Bulletin of Mathematical Linguistics]]></source>
<year>2019</year>
<volume>113</volume>
<numero>1</numero>
<issue>1</issue>
</nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[A. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Martinez]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Yoshimoto]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Matsumoto]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Improving sequence to sequence neural machine translation by utilizing syntactic dependency information]]></article-title>
<source><![CDATA[Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP, Volume 1: Long Papers]]></source>
<year>2017</year>
<page-range>21-9</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Luong]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[Q. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Vinyals]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaiser]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Multi-task sequence to sequence learning]]></article-title>
<source><![CDATA[CoRR]]></source>
<year>2015</year>
<volume>abs/1511.06114</volume>
</nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Machá&#269;ek]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Vidra]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Morphological and Language-Agnostic Word Segmentation for NMT]]></article-title>
<source><![CDATA[Text, Speech, and Dialogue: TSD 2018]]></source>
<year>2018</year>
<numero>11107</numero>
<issue>11107</issue>
<page-range>277-84</page-range><publisher-loc><![CDATA[Cham ]]></publisher-loc>
<publisher-name><![CDATA[Masaryk University]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nadejde]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Reddy]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sennrich]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Dwojak]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Junczys-Dowmunt]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Koehn]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Birch]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Predicting target language ccg supertags improves neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper]]></source>
<year>2017</year>
<page-range>68-79</page-range></nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Niehues]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Exploiting linguistic resources for neural machine translation using multitask learning]]></article-title>
<source><![CDATA[WMT]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B24">
<label>24</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nivre]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Agic]]></surname>
<given-names><![CDATA[&#381;.]]></given-names>
</name>
<name>
<surname><![CDATA[Ahrenberg]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Antonsen]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Universal dependencies 2.0]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B25">
<label>25</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nivre]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Hall]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Kübler]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[McDonald]]></surname>
<given-names><![CDATA[R. T.]]></given-names>
</name>
<name>
<surname><![CDATA[Nilsson]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Riedel]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Yuret]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The conll 2007 shared task on dependency parsing]]></article-title>
<source><![CDATA[EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic]]></source>
<year>2007</year>
<page-range>915-32</page-range></nlm-citation>
</ref>
<ref id="B26">
<label>26</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Papineni]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Roukos]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ward]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhu]]></surname>
<given-names><![CDATA[W.-J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[BLEU: A method for automatic evaluation of machine translation]]></article-title>
<source><![CDATA[Proceedings of the 40th Annual Meeting on Association for Computational Linguistics]]></source>
<year>2002</year>
<page-range>311-8</page-range><publisher-loc><![CDATA[Stroudsburg, PA, USA ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B27">
<label>27</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Popel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[CUNI Transformer Neural MT System for WMT18]]></article-title>
<source><![CDATA[Proceedings of the Third Conference on Machine Translation]]></source>
<year>2018</year>
<page-range>486-91</page-range><publisher-loc><![CDATA[Belgium, Brussels ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B28">
<label>28</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Popel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Bojar]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Training tips for the transformer model]]></article-title>
<source><![CDATA[The Prague Bulletin of Mathematical Linguistics]]></source>
<year>2018</year>
<volume>110</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>43-70</page-range></nlm-citation>
</ref>
<ref id="B29">
<label>29</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Popel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[&#381;abokrtský]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[TectoMT: Modular NLP framework]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Loftsson]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Rögnvaldsson]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Helgadottir]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Proceedings of the 7th International Conference on Advances in Natural Language Processing (IceTAL 2010), volume 6233 of Lecture Notes in Computer Science]]></source>
<year>2010</year>
<page-range>293-304</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B30">
<label>30</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Popovic]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[chrf: character n-gram f-score for automatic MT evaluation]]></article-title>
<source><![CDATA[Proceedings of the Tenth Workshop on Statistical Machine Translation, WMT@EMNLP 2015, Lisbon, Portugal]]></source>
<year>2015</year>
<page-range>392-5</page-range></nlm-citation>
</ref>
<ref id="B31">
<label>31</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shi]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Padhi]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Knight]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Does string-based neural MT learn source syntax?]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Su]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Carreras]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Duh]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2016]]></source>
<year>2016</year>
<page-range>1526-34</page-range><publisher-name><![CDATA[The Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B32">
<label>32</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Stanojevi&#263;]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sima'an]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Fitting sentence level translation evaluation with many dense features]]></article-title>
<source><![CDATA[Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)]]></source>
<year>2014</year>
<page-range>202-6</page-range><publisher-loc><![CDATA[Doha, Qatar ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B33">
<label>33</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Steedman]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[The Syntactic Process]]></source>
<year>2000</year>
<publisher-loc><![CDATA[Cambridge, MA, USA ]]></publisher-loc>
<publisher-name><![CDATA[MIT Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B34">
<label>34</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Straka]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Straková]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe]]></article-title>
<source><![CDATA[Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies]]></source>
<year>2017</year>
<page-range>88-99</page-range><publisher-loc><![CDATA[Vancouver, Canada ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B35">
<label>35</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Strubell]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Verga]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Andor]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Weiss]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[McCallum]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Linguistically-informed self-attention for semantic role labeling]]></article-title>
<source><![CDATA[Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing]]></source>
<year>2018</year>
<page-range>5027-38</page-range><publisher-loc><![CDATA[Brussels, Belgium ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B36">
<label>36</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Vinyals]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[Q. V.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Sequence to sequence learning with neural networks]]></article-title>
<source><![CDATA[NIPS]]></source>
<year>2014</year>
<page-range>3104-12</page-range></nlm-citation>
</ref>
<ref id="B37">
<label>37</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tamchyna]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Weller-Di Marco]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Fraser]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Modeling target-side inflection in neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper]]></source>
<year>2017</year>
<page-range>32-42</page-range></nlm-citation>
</ref>
<ref id="B38">
<label>38</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tiedemann]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[News from OPUS - A collection of multilingual parallel corpora with tools and interfaces]]></article-title>
<source><![CDATA[Proc. of RANLP]]></source>
<year>2009</year>
<volume>V</volume>
<page-range>237-48</page-range><publisher-loc><![CDATA[Borovets, Bulgaria ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B39">
<label>39</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tran]]></surname>
<given-names><![CDATA[K. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Bisazza]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Monz]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The importance of being recurrent for modeling hierarchical structure]]></article-title>
<source><![CDATA[CoRR]]></source>
<year>2018</year>
<volume>abs/1803.03585</volume>
</nlm-citation>
</ref>
<ref id="B40">
<label>40</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vaswani]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Shazeer]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Parmar]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Uszkoreit]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Gomez]]></surname>
<given-names><![CDATA[A. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaiser]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Polosukhin]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Attention is all you need]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA]]></source>
<year>2017</year>
<page-range>6000-10</page-range></nlm-citation>
</ref>
<ref id="B41">
<label>41</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Peter]]></surname>
<given-names><![CDATA[J.-T.]]></given-names>
</name>
<name>
<surname><![CDATA[Rosendahl]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ney]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Character: Translation edit rate on character level]]></article-title>
<source><![CDATA[Proceedings of the First Conference on Machine Translation]]></source>
<year>2016</year>
<page-range>505-10</page-range></nlm-citation>
</ref>
<ref id="B42">
<label>42</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zaremoodi]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Haffari]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<source><![CDATA[Neural machine translation for bilingually scarce scenarios: A deep multi-task learning approach]]></source>
<year>2018</year>
</nlm-citation>
</ref>
<ref id="B43">
<label>43</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zoph]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Knight]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Multi-Source Neural Translation]]></article-title>
<source><![CDATA[Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies]]></source>
<year>2016</year>
<page-range>30-4</page-range><publisher-loc><![CDATA[San Diego, California ]]></publisher-loc>
<publisher-name><![CDATA[Association for Computational Linguistics]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B44">
<label>44</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zoph]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Yuret]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[May]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Knight]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Transfer learning for low-resource neural machine translation]]></article-title>
<source><![CDATA[Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing]]></source>
<year>2016</year>
<page-range>1568-75</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
