<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462018000300767</article-id>
<article-id pub-id-type="doi">10.13053/cys-22-3-3019</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Artificial Method for Building Monolingual Plagiarized Arabic Corpus]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Mahmoud]]></surname>
<given-names><![CDATA[Adnen]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[Mounir]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,University of Tunis LaTICE Laboratory ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Tunisia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2018</year>
</pub-date>
<volume>22</volume>
<numero>3</numero>
<fpage>767</fpage>
<lpage>776</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462018000300767&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462018000300767&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462018000300767&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Plagiarism in textual documents is a widespread problem seen the large digital repository existing on the web. Moreover, it is difficult to make evaluation and comparison between solutions because of the lack of plagiarized resources in Arabic language publicly available. In this context, this paper describes automatic construction of a paraphrased corpus in order to deal with these issues and conduct our experiments, as follows: First, we collected a large corpus containing more than 12 million sentences from different resources. Then, we cleaned it up unnecessary data by applying a set of preprocessing techniques. After that, we used word2vec algorithm to create a vocabulary from the collected corpus. It extracted efficiently the semantic relationships between words to exploit. Subsequently, we replaced each word of the source corpus with the most similar vocabulary word based on an index used randomly to eventually obtain a suspect corpus. Different experiments are done. Thus, we varied the dimensions of vectors and window sizes to predict the correct context of words and identify the semantically closest words of the target.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Arabic language]]></kwd>
<kwd lng="en"><![CDATA[automatic creation]]></kwd>
<kwd lng="en"><![CDATA[data collection]]></kwd>
<kwd lng="en"><![CDATA[word embedding]]></kwd>
<kwd lng="en"><![CDATA[paraphrase]]></kwd>
<kwd lng="en"><![CDATA[plagiarism]]></kwd>
<kwd lng="en"><![CDATA[semantic analysis]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sharjeel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Rayson]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Muhammad]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Nawab]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[UPPC - Urdu Paraphrase Plagiarism Corpus]]></source>
<year>2016</year>
<conf-name><![CDATA[ Tenth International Conference on Language Resources and Evaluation (LREC)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1832-6</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mahmoud]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Semantic similarity analysis for paraphrase identification in Arabic texts]]></source>
<year>2017</year>
<volume>31</volume>
<conf-name><![CDATA[ 31st Pacific Asia Conference on Language, Information and Computation, Philippine (PACLIC)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>274-81</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Zouaghi]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ayadi]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[ISAO: An intelligent system of opinion analysis]]></article-title>
<source><![CDATA[Research in Computing Science]]></source>
<year>2016</year>
<volume>110</volume>
<page-range>21-30</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Saad]]></surname>
<given-names><![CDATA[M. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Ashour]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
</person-group>
<source><![CDATA[OSAC: Open Source Arabic Corpora]]></source>
<year>2010</year>
<conf-name><![CDATA[ 6th International Conference on Electrical and Computer Systems (EECS&#8217;10)]]></conf-name>
<conf-loc>Lefke, North Cyprus </conf-loc>
<page-range>1-6</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Batita]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Derivational relations in Arabic Wordnet]]></source>
<year>2018</year>
<conf-name><![CDATA[ The 9th Global WordNet Conference GWC]]></conf-name>
<conf-loc> </conf-loc>
<page-range>137-44</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mansouri]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Charhad]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A heuristic approach to detect and localize text in Arabic news video]]></article-title>
<source><![CDATA[Computación y Sistemas]]></source>
<year>2018</year>
<volume>23</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>75-82</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hkiri]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Mallat]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Arabic-English text translation leveraging hybrid NER]]></source>
<year>2017</year>
<volume>31</volume>
<conf-name><![CDATA[ 31st Pacific Asia Conference on Language, Information and Computation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>124-31</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Boudhief]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Maraoui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Elaboration of a model for an indexed base for teaching Arabic language to disabled people]]></source>
<year>2014</year>
<conf-name><![CDATA[ 6th International Conference on (CIST)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>110-6</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Meddeb]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Maraoui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Aljawarneh]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Hybrid modelling of an off line Arabic handwriting recognition system: results and evaluation]]></article-title>
<source><![CDATA[International Journal Intelligent Enterprise (IJIE)]]></source>
<year>2017</year>
<volume>4</volume>
<numero>1/2</numero>
<issue>1/2</issue>
<page-range>168-89</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mahmoud]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[A text semantic similarity approach for Arabic paraphrase detection]]></source>
<year>2017</year>
<conf-name><![CDATA[ International Conference on Computational Linguistics and Intelligent Text Processing CICLing]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ben-Mohamed]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Mallat]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Nahdi]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Zrigui]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Exploring the potential of schemes in building NLP tools for Arabic language]]></article-title>
<source><![CDATA[International Arab Journal of Information Technology (IAJIT)]]></source>
<year>2015</year>
<volume>6</volume>
<numero>12</numero>
<issue>12</issue>
<page-range>13-9</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Siddiqui]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Khan]]></surname>
<given-names><![CDATA[I. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Jambi]]></surname>
<given-names><![CDATA[K. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Elhaj]]></surname>
<given-names><![CDATA[S. O.]]></given-names>
</name>
<name>
<surname><![CDATA[Bagais]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Developing an Arabic plagiarism detection corpus]]></article-title>
<source><![CDATA[Computer Science &amp; Information Technology (CS &amp; IT)]]></source>
<year>2014</year>
<page-range>261-9</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sameen]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sharjeel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Nawab]]></surname>
<given-names><![CDATA[R. M. A]]></given-names>
</name>
<name>
<surname><![CDATA[Rayson]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Muneer]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Measuring short text reuse for the Urdu language]]></article-title>
<source><![CDATA[Language Resources &amp; Evaluation]]></source>
<year>2017</year>
<volume>51</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>777-803</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Osman]]></surname>
<given-names><![CDATA[A. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Barukab]]></surname>
<given-names><![CDATA[O. M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[SVM significant role selection method for improving semantic text plagiarism detection]]></article-title>
<source><![CDATA[International Journal of Advanced and Applied Sciences]]></source>
<year>2017</year>
<volume>4</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>112-22</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shinyama]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Sekine]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Paraphrase acquisition for information extraction]]></article-title>
<source><![CDATA[Artificial Intelligence Review]]></source>
<year>2014</year>
<volume>42</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>851-94</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shenoy]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Potey]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Semantic similarity search model for obfuscated plagiarism detection in Marathi language using Fuzzy and Naïve Bayes approaches IOSR]]></article-title>
<source><![CDATA[Journal of Computer Engineering]]></source>
<year>2016</year>
<volume>18</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>83-8</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mohtaj]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Asghari]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Zarrabi]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
</person-group>
<source><![CDATA[Developing monolingual English corpus for plagiarism detection using human annotated paraphrase corpus]]></source>
<year>2015</year>
<publisher-name><![CDATA[Notebook for PAN at CLEF´15]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bensalem]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Boukhalfa]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Rosso]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Abouenour]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Darwish]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Shikhi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Overview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic plagiarism detection]]></source>
<year>2015</year>
<page-range>111-22</page-range><publisher-name><![CDATA[PA&#323;15]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Taamallah]]></surname>
<given-names><![CDATA[S. B]]></given-names>
</name>
</person-group>
<source><![CDATA[Prétraitement de données et création d&#8217;un segmenteur de l&#8217;arabe pour un système de traduction probabiliste vers le français]]></source>
<year>2012</year>
<publisher-name><![CDATA[Université Stendhal Grenoble]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alrabiah]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Al-Salman]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Atwell]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Alhelewh]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[KSUCCA: A key to exploring Arabic historical linguistics]]></article-title>
<source><![CDATA[International Journal of Computational Linguistics (IJCL)]]></source>
<year>2014</year>
<volume>5</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>27-36</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Sutskever]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrado]]></surname>
<given-names><![CDATA[G. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Distributed representations of words and phrases and their compositionality]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2013</year>
<volume>26</volume>
<page-range>3111-9</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
