<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462024000402089</article-id>
<article-id pub-id-type="doi">10.13053/cys-28-4-5306</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Engelbach]]></surname>
<given-names><![CDATA[Matthias]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Klau]]></surname>
<given-names><![CDATA[Dennis]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Kintz]]></surname>
<given-names><![CDATA[Maximilien]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ulrich]]></surname>
<given-names><![CDATA[Alexander]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Institute for Industrial Engineering IAO  ]]></institution>
<addr-line><![CDATA[Fraunhofer ]]></addr-line>
<country>Germany</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,University of Stuttgart Institute of Human Factors and Technology Management IAT ]]></institution>
<addr-line><![CDATA[Stuttgart ]]></addr-line>
<country>Germany</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,alexander.ulrich@contractor.de  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Germany</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<volume>28</volume>
<numero>4</numero>
<fpage>2089</fpage>
<lpage>2101</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462024000402089&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462024000402089&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462024000402089&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Job descriptions are posted on many online channels, including company websites, job boards or social media platforms. These descriptions are usually published with varying text for the same job, due to the requirements of each platform or to target different audiences. However, for the purpose of automated recruitment and assistance of people working with these texts, it is helpful to aggregate job postings across platforms and thus detect duplicate descriptions that refer to the same job. In this work, we propose an approach for detecting duplicates in job descriptions. We show that combining overlap-based character similarity with text embedding and keyword matching methods lead to convincing results. In particular, we show that although no approach individually achieves satisfying performance, a combination of string comparison, deep textual embeddings, and the use of curated weighted lookup lists for specific skills leads to a significant boost in overall performance. A tool based on our approach is being used in production and feedback from real-life use confirms our evaluation.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Job posting analysis]]></kwd>
<kwd lng="en"><![CDATA[similarity embeddings]]></kwd>
<kwd lng="en"><![CDATA[domain knowledge]]></kwd>
<kwd lng="en"><![CDATA[duplicate detection]]></kwd>
<kwd lng="en"><![CDATA[deployed application]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bilenko]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Mooney]]></surname>
<given-names><![CDATA[R. J.]]></given-names>
</name>
</person-group>
<source><![CDATA[On evaluation and training-set construction for duplicate detection]]></source>
<year>2003</year>
<conf-name><![CDATA[ KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>7-12</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Burk]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Javed]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Balaji]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Apollo: Near-duplicate detection for job ads in the online recruitment domain]]></source>
<year>2017</year>
<conf-name><![CDATA[ International Conference on Data Mining Workshops]]></conf-name>
<conf-date>2017</conf-date>
<conf-loc> </conf-loc>
<page-range>177-82</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chan]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Hogaboam]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Cao]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[Artificial intelligence in human resources]]></source>
<year>2022</year>
<page-range>139-55</page-range><publisher-name><![CDATA[Springer International Publishing]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Donovan]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[Automated recruiting and the human factor]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Draisbach]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
</person-group>
<source><![CDATA[Efficient duplicate detection and the impact of transitivity]]></source>
<year>2022</year>
<publisher-name><![CDATA[Universität Potsdam]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Draisbach]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
<name>
<surname><![CDATA[Naumann]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<source><![CDATA[On choosing thresholds for duplicate detection]]></source>
<year>2013</year>
<conf-name><![CDATA[ 18th International Conference on Information Quality]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ektefa]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Jabar]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Sidi]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Memar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ibrahim]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ramli]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[A threshold-based similarity measure for duplicate detection]]></source>
<year>2011</year>
<conf-name><![CDATA[ Conference on Open Systems]]></conf-name>
<conf-date>2011</conf-date>
<conf-loc> </conf-loc>
<page-range>37-41</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Engelbach]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Klau]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Scheerer]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Drawehn]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Kintz]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Fine-tuning and aligning question answering models for complex information extraction tasks]]></source>
<year>2023</year>
<conf-name><![CDATA[ KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Foltýnek]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Meuschke]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Gipp]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Academic plagiarism detection: A systematic literature review]]></article-title>
<source><![CDATA[ACM Computing Surveys]]></source>
<year>2019</year>
<volume>52</volume>
<numero>6</numero>
<issue>6</issue>
<page-range>1-42</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gao]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[He]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Xia]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Duplicate short text detection based on Word2vec]]></source>
<year>2017</year>
<conf-name><![CDATA[ 8th IEEE International Conference on Software Engineering and Service Science]]></conf-name>
<conf-date>2017</conf-date>
<conf-loc> </conf-loc>
<page-range>33-7</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ginart]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Jinye-Zhang]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zou]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[MLDemon: Deployment monitoring for machine learning systems]]></source>
<year>2022</year>
<volume>151</volume>
<conf-name><![CDATA[ The 25th International Conference on Artificial Intelligence and Statistics]]></conf-name>
<conf-loc> </conf-loc>
<page-range>3962-97</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gunawan]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Sembiring]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Budiman]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The implementation of cosine similarity to calculate text relevance between two documents]]></article-title>
<source><![CDATA[Journal of physics: conference series]]></source>
<year>2018</year>
<volume>978</volume>
<page-range>012120</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Xie]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ding]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[AI-assisted recruiting technologies: Tools, challenges, and opportunities]]></source>
<year>2021</year>
<conf-name><![CDATA[ 39th ACM International Conference on Design of Communication]]></conf-name>
<conf-loc> </conf-loc>
<page-range>359-61</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Leong]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Technology and recruiting 101: how it works and where it&#8217;s going]]></article-title>
<source><![CDATA[Strategic HR Review]]></source>
<year>2018</year>
<volume>17</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>50-2</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Levenshtein]]></surname>
<given-names><![CDATA[V. I.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Binary codes capable of correcting deletions, insertions, and reversals]]></article-title>
<source><![CDATA[Soviet Physics Doklady]]></source>
<year>1966</year>
<volume>10</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>707-10</page-range></nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Maier]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The complexity of some problems on subsequences and supersequences]]></article-title>
<source><![CDATA[Journal of the ACM]]></source>
<year>1978</year>
<volume>25</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>322-36</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mikolov]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Corrado]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Efficient estimation of word representations in vector space]]></source>
<year>2013</year>
<conf-name><![CDATA[ 1st International Conference on Learning Representations]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Park]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Lu]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Marion]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Cataloging professionals in the digital environment: A content analysis of job descriptions]]></article-title>
<source><![CDATA[Journal of the American Society for Information Science and Technology]]></source>
<year>2009</year>
<volume>60</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>844-57</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rabanser]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Günnemann]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Lipton]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Failing loudly: An empirical study of methods for detecting dataset shift]]></article-title>
<source><![CDATA[Advances in Neural Information Processing Systems]]></source>
<year>2019</year>
<volume>32</volume>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ramya]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Venugopal]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Iyengar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Patnaik]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Feature extraction and duplicate detection for text mining: A survey]]></article-title>
<source><![CDATA[Global Journal of Computer Science and Technology: C Software and Data Engineering]]></source>
<year>2016</year>
<volume>16</volume>
<numero>05</numero>
<issue>05</issue>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Reimers]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Gurevych]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Making monolingual sentence embeddings multilingual using knowledge distillation]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Samsi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[McDonald]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Michaleas]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Jones]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Bergeron]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Kepner]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Tiwari]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Gadepally]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
</person-group>
<source><![CDATA[From words to watts: Benchmarking the energy costs of large language model inference]]></source>
<year>2023</year>
<conf-name><![CDATA[ High Performance Extreme Computing Conference]]></conf-name>
<conf-date>2023</conf-date>
<conf-loc> </conf-loc>
<page-range>1-9</page-range></nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Singhal]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Modern information retrieval: A brief overview]]></source>
<year>2001</year>
<volume>24</volume>
<numero>4</numero>
<conf-name><![CDATA[ Data Engineering Bulletin]]></conf-name>
<conf-loc> </conf-loc>
<issue>4</issue>
<page-range>35-43</page-range></nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sparck-Jones]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A statistical interpretation of term specificity and its application in retrieval]]></article-title>
<source><![CDATA[Journal of Documentation]]></source>
<year>1972</year>
<volume>28</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>11-21</page-range></nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sureka]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Jalote]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Detecting duplicate bug report using character N-Gram-Based features]]></source>
<year>2010</year>
<conf-name><![CDATA[ Asia Pacific Software Engineering Conference]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc> </conf-loc>
<page-range>366-74</page-range></nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tian]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Lo]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Improved duplicate bug report identification]]></source>
<year>2012</year>
<conf-name><![CDATA[ 16th European Conference on Software Maintenance and Reengineering]]></conf-name>
<conf-date>2012</conf-date>
<conf-loc> </conf-loc>
<page-range>385-90</page-range></nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vo]]></surname>
<given-names><![CDATA[M. T.]]></given-names>
</name>
<name>
<surname><![CDATA[Vo]]></surname>
<given-names><![CDATA[A. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Nguyen]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Sharma]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Le]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Dealing with the class imbalance problem in the detection of fake job descriptions]]></article-title>
<source><![CDATA[Computers, Materials and Continua]]></source>
<year>2021</year>
<volume>68</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>521-35</page-range></nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Dong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Measurement of text similarity: A survey]]></article-title>
<source><![CDATA[Information]]></source>
<year>2020</year>
<volume>11</volume>
<numero>9</numero>
<issue>9</issue>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Cer]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Ahmad]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Guo]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Law]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Constant]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Hernandez-Abrego]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Yuan]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Tar]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Sung]]></surname>
<given-names><![CDATA[Y. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Strope]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Kurzweil]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[Multilingual universal sentence encoder for semantic retrieval]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B30">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Mason]]></surname>
<given-names><![CDATA[C. M.]]></given-names>
</name>
</person-group>
<source><![CDATA[A framework for duplicate detection from online job postings]]></source>
<year>2022</year>
<conf-name><![CDATA[ International Conference on Web Intelligence and Intelligent Agent Technology]]></conf-name>
<conf-loc> </conf-loc>
<page-range>249-56</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
