<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462014000300007</article-id>
<article-id pub-id-type="doi">10.13053/CyS-18-3-2043</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[Grigori]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[Alexander]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gómez-Adorno]]></surname>
<given-names><![CDATA[Helena]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pinto]]></surname>
<given-names><![CDATA[David]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Instituto Politécnico Nacional Centro de Investigación en Computación ]]></institution>
<addr-line><![CDATA[México Distrito Federal]]></addr-line>
<country>México</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Benemérita Universidad Autónoma de Puebla Facultad de Ciencias de la Computación ]]></institution>
<addr-line><![CDATA[Puebla ]]></addr-line>
<country>México</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2014</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2014</year>
</pub-date>
<volume>18</volume>
<numero>3</numero>
<fpage>491</fpage>
<lpage>504</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462014000300007&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462014000300007&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462014000300007&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words "play" and "game" are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call "soft cosine measure". We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Soft similarity]]></kwd>
<kwd lng="en"><![CDATA[soft cosine measure]]></kwd>
<kwd lng="en"><![CDATA[vector space model]]></kwd>
<kwd lng="en"><![CDATA[similarity between features]]></kwd>
<kwd lng="en"><![CDATA[Levenshtein distance]]></kwd>
<kwd lng="en"><![CDATA[n-grams]]></kwd>
<kwd lng="en"><![CDATA[syntactic n-grams]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[  	    <p align="justify"><font face="verdana" size="4">Art&iacute;culos regulares</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="center"><font face="verdana" size="4"><b>Soft Similarity and Soft Cosine Measure: Similarity of Features in Vector Space Model</b></font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="center"><font face="verdana" size="2"><b>Grigori Sidorov<sup>1</sup>, Alexander Gelbukh<sup>1</sup>, Helena G&oacute;mez&#45;Adorno<sup>1</sup>, and David Pinto<sup><sup>2</sup></sup></b></font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><sup><sup><i>1</i></sup></sup> <i>Centro de Investigaci&oacute;n, en Computaci&oacute;n, Instituto Polit&eacute;ctico Nacional, M&eacute;xico D.F., M&eacute;xico</i>. <a href="mailto:sidorov@cic.ipn.mx">sidorov@cic.ipn.mx</a>, <a href="mailto:gelbukh@cic.ipn.mx">gelbukh@cic.ipn.mx</a>, <a href="mailto:helena.adorno@gmail.com">helena.adorno@gmail.com</a></font></p>  	    <p align="justify"><font face="verdana" size="2"><i><sup><sup>2</sup></sup> Facultad de Ciencias de la Computaci&oacute;n, Benem&eacute;rita Universidad Aut&oacute;noma de Puebla, Puebla, M&eacute;xico.</i> <a href="mailto:sidorov@cic.ipn.mx"></a><a href="mailto:dpinto@cs.buap.mx">dpinto@cs.buap.mx</a>.</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    ]]></body>
<body><![CDATA[<p align="justify"><font face="verdana" size="2">Article received on 25/07/2014.    <br> 	Accepted on 12/09/2014.</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Abstract</b></font></p>  	    <p align="justify"><font face="verdana" size="2">We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data. We call the proposed similarity measure <b>soft similarity</b>. Similarity between features is common, for example, in natural language processing: words, n&#45;grams, or syntactic n&#45;grams can be somewhat different (which makes them different features) but still have much in common: for example, words "play" and "game" are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well&#45;known cosine similarity measure in VSM by introducing what we call <b>"soft cosine measure"</b>. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n&#45;grams as features and Levenshtein distance as the similarity between n&#45;grams, measured either in characters or in elements of n&#45;grams.</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>Keywords:</b> Soft similarity, soft cosine measure, vector space model, similarity between features, Levenshtein distance, n&#45;grams, syntactic n&#45;grams.</font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><a href="/pdf/cys/v18n3/v18n3a7.pdf" target="_blank">DESCARGAR ART&Iacute;CULO EN FORMATO PDF</a></font></p>  	    <p align="justify"><font face="verdana" size="2">&nbsp;</font></p>  	    <p align="justify"><font face="verdana" size="2"><b>References</b></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>1. Bejar, I., Chaffin, R., &amp; Embretson, S. (1991).</b> <i>Cognitive</i> <i>and</i> <i>psychometric</i> <i>analysis</i> <i>of</i> <i>analogical</i> <i>problem</i> <i>solving</i>. Recent research in psychology. Springer&#45;Verlag.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075502&pid=S1405-5546201400030000700001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>2. Dijkstra, E. W. (1959).</b> A note on two problems in connexion with graphs. <i>Numerische</i> <i>mathematik</i>, 1(1), 269&#150;271.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075504&pid=S1405-5546201400030000700002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>3. Gmez&#45;Adorno, H., Sidorov, G., Pinto, D., &amp; Gelbukh, A. (2014).</b> Graph&#45;based approach to the question answering task based on entrance exams. <b>Cappellato, L., Ferro, N., Halvey, M., &amp; Kraaij, W.</b>, editors, <i>Notebook</i> <i>for</i> <i>PAN</i> <i>at</i> <i>CLEF</i> <i>2014.</i> <i>CLEF</i> <i>2014.</i> <i>CLEF2014</i> <i>Working</i> <i>Notes</i>, volume 1180 of <i>CEUR</i> <i>Workshop</i> <i>Proceedings</i>, CEUR&#45;WS.org, pp. 1395&#150;1403.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075506&pid=S1405-5546201400030000700003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>4. Jimenez, S., Gonzalez, F., &amp; Gelbukh, A.</b> (<b>2010</b>). Text comparison using soft cardinality. <b>Chavez, E. &amp; Lonardi, S.</b>, editors, <i>String</i> <i>Processing</i> <i>and</i> <i>Information</i> <i>Retrieval</i>, volume 6393 of <i>Lecture</i> <i>Notes</i> <i>in</i> <i>Computer</i> <i>Science</i>, Springer, pp. 297&#150;302.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075508&pid=S1405-5546201400030000700004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>5. Jimenez Vargas, S. &amp; Gelbukh, A. (2012).</b> Baselines for natural language processing tasks based on soft cardinality spectra. <i>International</i> <i>Journal</i> <i>of</i> <i>Applied</i> <i>and</i> <i>Computational</i> <i>Mathematics</i>, 11(2), 180&#150;199.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075510&pid=S1405-5546201400030000700005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>6. Levenshtein, V. I. (1966).</b> Binary codes capable of correcting deletions, insertions, and reversals. <i>Soviet</i> <i>Physics</i> <i>Doklady</i>, 10(8), 707&#150;710.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075512&pid=S1405-5546201400030000700006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>7. Li,B.&amp;Han,L. (2013).</b> Distance weighted cosine similarity measure for text classification. <b>Yin, H., Tang, K., Gao, Y., Klawonn, F., Lee, M., Weise, T., Li, B., &amp; Yao, X.</b>, editors, <i>IDEAL</i>, volume 8206 of <i>Lecture</i> <i>Notes</i> <i>in</i> <i>Computer</i> <i>Science</i>, Springer, pp. 611&#150;618.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075514&pid=S1405-5546201400030000700007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>8. Mikawa, K., Ishida, T., &amp; Goto, M. (2011).</b> A proposal of extended cosine measure for distance metric learning in text classification. <i>Systems,</i> <i>Man,</i> <i>and</i> <i>Cybernetics</i> <i>(SMC)</i>, IEEE, pp. 1741&#150;1746.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075516&pid=S1405-5546201400030000700008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>9. Miller, G. A. (1995).</b> WordNet: A lexical database for English. <i>Communications</i> <i>of</i> <i>the</i> <i>ACM</i>, 38, 39&#150;41.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075518&pid=S1405-5546201400030000700009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>10. Pe&ntilde;as, A., Hovy, E. H., Forner, P., Rodrigo, &Aacute;., Sutcliffe, R. F. E., Forascu, C., &amp; Sporleder, C. (2011).</b> Overview of qa4mre at clef 2011: Question answering for machine reading evaluation. <i>CLEF</i> <i>(Notebook</i> <i>Papers/Labs/Workshop)</i>, pp. 1&#150;20.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075520&pid=S1405-5546201400030000700010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>11. Pe&ntilde;as, A., Hovy, E. H., Forner, P., Rodrigo, &Aacute;., Sutcliffe, R. F. E., Sporleder, C., Forascu, C., Benajiba, Y., &amp; Osenova, P. (2012).</b> Overview of qa4mre at clef 2012: Question answering for machine reading evaluation. <i>CLEF</i> <i>(Online</i> <i>Working</i> <i>Notes/Labs/Workshop)</i>, pp. 1&#150;24.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075522&pid=S1405-5546201400030000700011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>12. Pe&ntilde;as A., Miyao, Y., Forner, P., &amp; Kando, N. (2013).</b> Overview of qa4mre 2013 entrance exams task. <i>CLEF</i> <i>(Online</i> <i>Working</i> <i>Notes/Labs/Workshop)</i>, pp. 1&#150;6.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075524&pid=S1405-5546201400030000700012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>13. Pinto, D., G&oacute;mez&#45;Adorno, H., Ayala, D. V., &amp; Singh, V. K. (2014).</b> A graph&#45;based multi&#45;level linguistic representation for document understanding. <i>Pattern</i> <i>Recognition</i> <i>Letters</i>, 41, 93&#150;102.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075526&pid=S1405-5546201400030000700013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>14. Poria, S., Agarwal, B., Gelbukh, A., Hussain, A., &amp; Howard, N. (2014).</b> Dependency&#45;based semantic parsing for concept&#45;level text analysis. <i>15th</i> <i>International</i> <i>Conference</i> <i>on</i> <i>Intelligent</i> <i>Text</i> <i>Processing</i> <i>and</i> <i>Computational</i> <i>Linguistics,</i> <i>CICLing</i> <i>2014,</i> <i>Part</i> <i>I</i>, number 8403 in Lecture Notes in Computer Science, Springer, pp. 113&#150;127.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075528&pid=S1405-5546201400030000700014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>15. Poria, S., Gelbukh, A., Cambria, E., Hussain, A., &amp; Huang, G.&#45;B. (2015).</b> EmoSenticSpace: A novel framework for affective common&#45;sense reasoning. <i>Knowledge&#45;Based</i> <i>Systems</i>.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075530&pid=S1405-5546201400030000700015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>16. Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., &amp; Bandyopadhyay, S. (2013).</b> Enhanced SenticNet with affective labels for concept&#45;based opinion mining. <i>IEEE</i> <i>Intelligent</i> <i>Systems</i>, 28, 31&#150;38.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075532&pid=S1405-5546201400030000700016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>17. Salton, G., editor (1988).</b> <i>Automatic</i> <i>text</i> <i>processing</i>. Addison&#45;Wesley Longman Publishing Co., Inc., Boston, MA, USA.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075534&pid=S1405-5546201400030000700017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>18. Sanchez&#45;Perez, M., Sidorov, G., &amp; Gelbukh, A. (2014).</b> The winning approach to text alignment for text reuse detection at pan 2014. <b>Cappellato, L., Ferro, N., Halvey, M., &amp; Kraaij, W.</b>, editors, <i>Notebook</i> <i>for</i> <i>PAN</i> <i>at</i> <i>CLEF</i> <i>2014.</i> <i>CLEF</i> <i>2014.</i> <i>CLEF2014</i> <i>Working</i> <i>Notes</i>, volume 1180 of <i>CEUR</i> <i>Workshop</i> <i>Proceedings</i>, CEUR&#45;WS.org, pp. 1004&#150;1011.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075536&pid=S1405-5546201400030000700018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>19. Sidorov, G. (2013).</b> Syntactic dependency based n&#45;grams in rule based automatic English as second language grammar correction. <i>International</i> <i>Journal</i> <i>of</i> <i>Computational</i> <i>Linguistics</i> <i>and</i> <i>Applications</i>, 4(2), 169&#150;188. Methods and Applications of Artificial and Computational Intelligence.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075538&pid=S1405-5546201400030000700019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    <!-- ref --><p align="justify"><font face="verdana" size="2"><b>20. Sidorov, G. (2014).</b> Should syntactic n&#45;grams contain names of syntactic relations? <i>International</i> <i>Journal</i> <i>of</i> <i>Computational</i> <i>Linguistics</i> <i>and</i> <i>Applications</i>, 5(1), 139&#150;158.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075540&pid=S1405-5546201400030000700020&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>  	    ]]></body>
<body><![CDATA[<!-- ref --><p align="justify"><font face="verdana" size="2"><b>21. Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., &amp; Chanona&#45;Hernndez, L. (2014).</b> Syntactic n&#45;grams as machine learning features for natural language processing. <i>Expert</i> <i>Systems</i> <i>with</i> <i>Applications</i>, 41(3), 853&#150;860. Methods and Applications of Artificial and Computational Intelligence.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2075542&pid=S1405-5546201400030000700021&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bejar]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Chaffin]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Embretson]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Cognitive and psychometric analysis of analogical problem solving]]></source>
<year>1991</year>
<publisher-name><![CDATA[Springer-Verlag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dijkstra]]></surname>
<given-names><![CDATA[E. W.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A note on two problems in connexion with graphs]]></article-title>
<source><![CDATA[Numerische mathematik]]></source>
<year>1959</year>
<volume>1</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>269-271</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gmez-Adorno]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Pinto]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Graph-based approach to the question answering task based on entrance exams]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Cappellato]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferro]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Halvey]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Kraaij]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
</person-group>
<source><![CDATA[Notebook for PAN at CLEF 2014. CLEF 2014. CLEF2014 Working Notes]]></source>
<year>2014</year>
<volume>1180</volume>
<page-range>1395-1403</page-range><publisher-name><![CDATA[CEUR-WS.org]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jimenez]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gonzalez]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Text comparison using soft cardinality]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Chavez]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Lonardi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[String Processing and Information Retrieval]]></source>
<year>2010</year>
<volume>6393</volume>
<page-range>297-302</page-range><publisher-name><![CDATA[Lecture Notes in Computer ScienceSpringer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jimenez Vargas]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Baselines for natural language processing tasks based on soft cardinality spectra]]></article-title>
<source><![CDATA[International Journal of Applied and Computational Mathematics]]></source>
<year>2012</year>
<volume>11</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>180-199</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Levenshtein]]></surname>
<given-names><![CDATA[V. I.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Binary codes capable of correcting deletions, insertions, and reversals]]></article-title>
<source><![CDATA[Soviet Physics Doklady]]></source>
<year>1966</year>
<volume>10</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>707-710</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Distance weighted cosine similarity measure for text classification]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Yin]]></surname>
</name>
<name>
<surname><![CDATA[Tang]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Gao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Klawonn]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Weise]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Yao]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<source><![CDATA[IDEAL]]></source>
<year>2013</year>
<volume>8206</volume>
<page-range>611-618</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikawa]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Ishida]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Goto]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[A proposal of extended cosine measure for distance metric learning in text classification]]></source>
<year>2011</year>
<page-range>1741-1746</page-range><publisher-name><![CDATA[Systems, Man, and CyberneticsIEEE]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Miller]]></surname>
<given-names><![CDATA[G. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[WordNet: A lexical database for English]]></article-title>
<source><![CDATA[Communications of the ACM]]></source>
<year>1995</year>
<volume>38</volume>
<page-range>39-41</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peñas]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Hovy]]></surname>
<given-names><![CDATA[E. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Forner]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Rodrigo]]></surname>
<given-names><![CDATA[Á.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutcliffe]]></surname>
<given-names><![CDATA[R. F. E.]]></given-names>
</name>
<name>
<surname><![CDATA[Forascu]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Sporleder]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of qa4mre at clef 2011: Question answering for machine reading evaluation]]></source>
<year>2011</year>
<page-range>1-20</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peñas]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Hovy]]></surname>
<given-names><![CDATA[E. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Forner]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Rodrigo]]></surname>
<given-names><![CDATA[Á.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutcliffe]]></surname>
<given-names><![CDATA[R. F. E.]]></given-names>
</name>
<name>
<surname><![CDATA[Sporleder]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Forascu]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Benajiba]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Osenova]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of qa4mre at clef 2012: Question answering for machine reading evaluation]]></source>
<year>2012</year>
<page-range>1-24</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peñas]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Miyao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Forner]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Kando]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of qa4mre 2013 entrance exams task]]></source>
<year>2013</year>
<page-range>1-6</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pinto]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Gómez-Adorno]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ayala]]></surname>
<given-names><![CDATA[D. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[V. K.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A graph-based multi-level linguistic representation for document understanding]]></article-title>
<source><![CDATA[Pattern Recognition Letters]]></source>
<year>2014</year>
<volume>41</volume>
<page-range>93-102</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Poria]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Hussain]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Howard]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Dependency-based semantic parsing for concept-level text analysis]]></article-title>
<source><![CDATA[15th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2014, Part I]]></source>
<year>2014</year>
<volume>8403</volume>
<page-range>113-127</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Poria]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Cambria]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Hussain]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[G.-B.]]></given-names>
</name>
</person-group>
<source><![CDATA[EmoSenticSpace: A novel framework for affective common-sense reasoning]]></source>
<year>2015</year>
<publisher-name><![CDATA[Knowledge-Based Systems]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Poria]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Hussain]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Howard]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Bandyopadhyay]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Enhanced SenticNet with affective labels for concept-based opinion mining]]></article-title>
<source><![CDATA[IEEE Intelligent Systems]]></source>
<year>2013</year>
<volume>28</volume>
<page-range>31-38</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Salton]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<source><![CDATA[Automatic text processing]]></source>
<year>1988</year>
<publisher-loc><![CDATA[Boston^eMA MA]]></publisher-loc>
<publisher-name><![CDATA[Addison-Wesley Longman Publishing Co., Inc.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sanchez-Perez]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The winning approach to text alignment for text reuse detection at pan 2014]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Cappellato]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferro]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Halvey]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Kraaij]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
</person-group>
<source><![CDATA[Notebook for PAN at CLEF 2014. CLEF 2014. CLEF2014 Working Notes]]></source>
<year>2014</year>
<volume>1180</volume>
<page-range>1004-1011</page-range><publisher-name><![CDATA[CEUR Workshop ProceedingsCEUR-WS.org]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Syntactic dependency based n-grams in rule based automatic English as second language grammar correction]]></article-title>
<source><![CDATA[International Journal of Computational Linguistics and Applications]]></source>
<year>2013</year>
<volume>4</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>169-188</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Should syntactic n-grams contain names of syntactic relations?]]></article-title>
<source><![CDATA[International Journal of Computational Linguistics and Applications]]></source>
<year>2014</year>
<volume>5</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>139-158</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Velasquez]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Stamatatos]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Gelbukh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Chanona-Hernndez]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Syntactic n-grams as machine learning features for natural language processing]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>2014</year>
<volume>41</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>853-860</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
