<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462020000301327</article-id>
<article-id pub-id-type="doi">10.13053/cys-24-3-3773</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Multimodal Learning Based Spatial Relation Identification]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Dash]]></surname>
<given-names><![CDATA[Sandeep Kumar]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sureshchandra]]></surname>
<given-names><![CDATA[Y. V.]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Mishra]]></surname>
<given-names><![CDATA[Yatharth]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[Partha]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[Ranjita]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[Alexander]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,National Insitute of Technology  ]]></institution>
<addr-line><![CDATA[Mizoram ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,National Insitute of Technology  ]]></institution>
<addr-line><![CDATA[Silchar ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Instituto Politécnico Nacional  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2020</year>
</pub-date>
<volume>24</volume>
<numero>3</numero>
<fpage>1327</fpage>
<lpage>1335</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462020000301327&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462020000301327&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462020000301327&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Spatial Relation identification is one of the integral parts of Spatial Information Retrieval. It deals with identifying the spatially related objects in view of their physical orientation or placement with respect to each other. The concept is widely used in many fields such as Robotics, Image Caption Generation and many more such areas. In this work the focus is to gather information from multiple modalities such as Image and its corresponding Text so as to strengthen the learning process for the identification of Spatial Relation pairs from a given text. Two different multimodal approaches are proposed in this work. In the first approach, information is explored as a sequential learning process where the individual Spatial Roles are identified as connected entities, which makes the Spatial Relation retrieval easy and efficient enough. To counter the small size of the dataset along with necessity to avoid overfitting, an efficient backward propagation based Neural Network was used to classify the candidate roles and the relations. The feature selection was different for all the classification tasks. Building on the selected feature from the first approach, the second approach uses a transfer learning method that utilizes an existing image caption generation model to retrieve the vital topic based information from image which is then used for the task. Thereby both approaches used information from two modalities which are further used to train the system in the respective approach. The model achieves state-of-the-art performance in terms of Precision for two of the Spatial Roles identification. This validates the advantage of using multimodal learning when compared with other partial-multimodal processes.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Spatial role labeling]]></kwd>
<kwd lng="en"><![CDATA[spatial relation identification]]></kwd>
<kwd lng="en"><![CDATA[multimodal learning]]></kwd>
<kwd lng="en"><![CDATA[multi layer perceptron]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Charniak]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Coarseto-fine n-best parsing and MaxEnt discriminative reranking]]></source>
<year>2005</year>
<conf-name><![CDATA[ 43rd Annual Meeting of the Association for Computational Linguistics]]></conf-name>
<conf-loc> </conf-loc>
<page-range>173&#8211;180</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dash]]></surname>
<given-names><![CDATA[S. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Acharya]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Pakray]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Topic-based image caption generation]]></article-title>
<source><![CDATA[Arabian Journal for Science and Engineering]]></source>
<year>2019</year>
<page-range>1&#8211;10</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[He]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Ren]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Deep residual learning for image recognition]]></source>
<year>2016</year>
<conf-name><![CDATA[ IEEE conference on Computer Vision and Pattern Recognition]]></conf-name>
<conf-loc> </conf-loc>
<page-range>770&#8211;778</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[G. J.]]></given-names>
</name>
<name>
<surname><![CDATA[Lawless]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gonzalo]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Kelly]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Goeuriot]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Mandl]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Cappellato]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferro]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Experimental IR Meets Multilinguality, Multimodality, and Interaction]]></source>
<year>2017</year>
<volume>10456</volume>
<conf-name><![CDATA[ 8th International Conference of the CLEF Association, CLEF 2017]]></conf-name>
<conf-date>September 11&#8211;14, 2017</conf-date>
<conf-loc>Dublin, Ireland </conf-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kordjamshidi]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Bethard]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Moens]]></surname>
<given-names><![CDATA[M. F.]]></given-names>
</name>
</person-group>
<source><![CDATA[SemEval-2012 task 3: Spatial role labeling]]></source>
<year>2012</year>
<conf-name><![CDATA[ First Joint Conference on Lexical and Computational Semantics &#8211; Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>365&#8211;373</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Krishna]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Groth]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Hata]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Kravitz]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Kalantidis]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[L. J.]]></given-names>
</name>
<name>
<surname><![CDATA[Shamma]]></surname>
<given-names><![CDATA[D. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Bernstein]]></surname>
<given-names><![CDATA[M. S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Visual genome: Connecting language and vision using crowdsourced dense image annotations]]></article-title>
<source><![CDATA[International Journal of Computer Vision]]></source>
<year>2017</year>
<volume>123</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>32&#8211;73</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lin]]></surname>
<given-names><![CDATA[T. Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Maire]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Belongie]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Hays]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Perona]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Ramanan]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Dollr]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Zitnick]]></surname>
<given-names><![CDATA[C. L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Microsoft COCO: Common objects in context]]></source>
<year>2014</year>
<conf-name><![CDATA[ European Conference on Computer Vision]]></conf-name>
<conf-loc> </conf-loc>
<page-range>740&#8211;755</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ludwig]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Kordjamshidi]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Moens]]></surname>
<given-names><![CDATA[M. F.]]></given-names>
</name>
</person-group>
<source><![CDATA[Deep embedding for spatial role labeling]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mazalov]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Martins]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Matos]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Spatial role labeling with convolutional neural networks]]></source>
<year>2015</year>
<conf-name><![CDATA[ 9th Workshop on Geographic Information Retrieval]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1&#8211;7</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pustejovsky]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Kordjamshidi]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Moens]]></surname>
<given-names><![CDATA[M. F.]]></given-names>
</name>
<name>
<surname><![CDATA[Levine]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Dworman]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Yocum]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
</person-group>
<source><![CDATA[SemEval-2015 task 8: SpaceEval]]></source>
<year>2015</year>
<conf-name><![CDATA[ 9th International Workshop on Semantic Evaluation (SemEval 2015)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>884&#8211;894</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rahgooy]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Manzoor]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
<name>
<surname><![CDATA[Kordjamshidi]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Visually guided spatial relation extraction from text]]></source>
<year>2018</year>
<conf-name><![CDATA[ Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies]]></conf-name>
<conf-date>2018</conf-date>
<conf-loc> </conf-loc>
<page-range>788&#8211;794</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Roberts]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Harabagiu]]></surname>
<given-names><![CDATA[S. M.]]></given-names>
</name>
</person-group>
<source><![CDATA[UTD-SpRL: A joint approach to spatial role labeling]]></source>
<year>2012</year>
<conf-name><![CDATA[ First Joint Conference on Lexical and Computational Semantics &#8211; Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>419&#8211;424</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="">
<collab>spaCy</collab>
<source><![CDATA[Industrial-strength natural language processing in Python]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Szegedy]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Vanhoucke]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Ioffe]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Shlens]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Wojna]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
</person-group>
<source><![CDATA[Rethinking the inception architecture for computer vision]]></source>
<year>2016</year>
<conf-name><![CDATA[ IEEE conference on Computer vision and Pattern Recognition]]></conf-name>
<conf-loc> </conf-loc>
<page-range>2818&#8211;2826</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tsikrika]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Popescu]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kludas]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of the Wikipedia image retrieval task at ImageCLEF 2011]]></source>
<year>2011</year>
<conf-name><![CDATA[ Proceedings of CLEF]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1&#8211;17</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
