<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462024000402251</article-id>
<article-id pub-id-type="doi">10.13053/cys-28-4-5292</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[LyricScraper: A Dataset of Spanish Song Lyrics Created via Web Scraping and Dual-labeling for LLM Classification]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Alcántara]]></surname>
<given-names><![CDATA[Tania]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[García-Vázquez]]></surname>
<given-names><![CDATA[Omar]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Hernández]]></surname>
<given-names><![CDATA[Mayté]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Calvo]]></surname>
<given-names><![CDATA[Hiram]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Desiderio]]></surname>
<given-names><![CDATA[Alan]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Instituto Politécnico Nacional Centro de Investigación en Computación ]]></institution>
<addr-line><![CDATA[Mexico City ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Instituto Politécnico Nacional Escuela Superior de Ingeniería Mecánica y Eléctrica ]]></institution>
<addr-line><![CDATA[Mexico City ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<volume>28</volume>
<numero>4</numero>
<fpage>2251</fpage>
<lpage>2260</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462024000402251&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462024000402251&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462024000402251&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Songs represent a powerful means of expressing emotions through melody and lyrics. This study focuses on understanding and classifying emotions present in songs, ranging from positive and negative to neutral emotions. This classification and understanding would not be possible without data, which was gathered using a proprietary web scraping algorithm to collect lyrics data online. Subsequently, a pseudo-labeling approach based on BERT was employed to assign sentiment labels to these lyrics, leveraging BERT&#8217;s ability to comprehend context and semantic relationships in language. This process enhanced the dataset&#8217;s quality and contributed to the success of sentiment analysis in songs. The new dataset addressed challenges related to sentence length by providing examples of song lyrics of varying lengths, facilitating more effective model training. Additionally, data imbalance was addressed through careful sample selection, representing a wide range of emotions in songs. This new dataset underwent classification using large-scale language models, achieving promising results. The accuracy metric reached an impressive 97.66% for DistilBERT and 97.83% for the F1 metric, highlighting the effectiveness of this approach in song sentiment analysis. This study underscores the importance of understanding emotions in songs and offers practical solutions to enhance the capabilities of language models in this task.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[NLP]]></kwd>
<kwd lng="en"><![CDATA[pseudo labeling]]></kwd>
<kwd lng="en"><![CDATA[web scraping]]></kwd>
<kwd lng="en"><![CDATA[deep learning]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alcántara]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Desiderio]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[García-Vázquez]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Calvo]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Corpus &#8221;textos de canciones en español mexicano v1&#8221;]]></source>
<year>2023</year>
<publisher-name><![CDATA[Laboratorio de Ciencias Cognitivas Computacionales, Centro de Investigación en Computación]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alcántara]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[García-Vázquez]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Calvo]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Sidorov]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Classification of songs in spanish with LLMs: An analysis of the construction of a dataset, through classification]]></article-title>
<source><![CDATA[Journal of Intelligent and Fuzzy Systems]]></source>
<year>2024</year>
<numero>Special</numero>
<issue>Special</issue>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Devlin]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Chang]]></surname>
<given-names><![CDATA[M. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Toutanova]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<source><![CDATA[BERT: Pre-training of deep bidirectional transformers for language understanding]]></source>
<year>2019</year>
<volume>1</volume>
<page-range>4171-86</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Edmonds]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Sedoc]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Multi-emotion classification for song lyrics]]></source>
<year>2021</year>
<conf-name><![CDATA[ 11th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis]]></conf-name>
<conf-loc> </conf-loc>
<page-range>221-35</page-range></nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Kong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Jiayi]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A survey of music emotion recognition]]></article-title>
<source><![CDATA[Frontiers of Computer Science]]></source>
<year>2022</year>
<volume>16</volume>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[He]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Jin]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Xiong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[Language feature mining for music emotion classification via supervised learning from lyrics]]></source>
<year>2008</year>
<conf-name><![CDATA[ Advances in Computation and Intelligence: Third International Symposium]]></conf-name>
<conf-loc> </conf-loc>
<page-range>426-35</page-range></nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jiménez]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Cardoso-Moreno]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Macías]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Calvo]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Pseudoetiquetado para el análisis de polaridad en tuits: Un primer acercamiento]]></article-title>
<source><![CDATA[Research in Computing Science]]></source>
<year>2023</year>
<volume>152</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>289-99</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[D. H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks]]></source>
<year>2013</year>
<conf-name><![CDATA[ International Conferenceon Machine Learning Workshop: Challenges in Representation Learning]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Ott]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Goyal]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Du]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Joshi]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Levy]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Lewis]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Zettlemoyer]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Stoyanov]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
</person-group>
<source><![CDATA[RoBERTa: A robustly optimized BERT pretraining approach]]></source>
<year>2019</year>
<conf-name><![CDATA[ International Conference on Learning Representations]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-15</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Maureira-Cid]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Ser humano: Emociones y lenguaje]]></article-title>
<source><![CDATA[Revista Electrónica de Psicología Iztacala]]></source>
<year>2010</year>
<volume>11</volume>
<numero>2</numero>
<issue>2</issue>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pagallo]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The trouble with digital copies: A short km phenomenology]]></article-title>
<source><![CDATA[Ethical Issues and Social Dilemmas in Knowledge Management: Organizational Innovation]]></source>
<year>2011</year>
<page-range>97-112</page-range><publisher-name><![CDATA[IGI Global]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pagallo]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
<name>
<surname><![CDATA[Ciani]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Anatomy of web data scraping: Ethics, standards, and the troubles of the law]]></source>
<year>2023</year>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pires]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Schlinger]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Garrette]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[How multilingual is multilingual BERT?]]></source>
<year>2019</year>
<conf-name><![CDATA[ 57th Annual Meeting of the Association for Computational Linguistics]]></conf-name>
<conf-loc> </conf-loc>
<page-range>4996-5001</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Poria]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Cambria]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Hazarika]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Majumder]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Zadeh]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Morency]]></surname>
<given-names><![CDATA[L. P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Context-dependent sentiment analysis in user-generated videos]]></source>
<year>2017</year>
<volume>1</volume>
<conf-name><![CDATA[ 55th Annual Meeting of the Association for Computational Linguistics]]></conf-name>
<conf-loc> </conf-loc>
<page-range>873-83</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sanh]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Debut]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Chaumond]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Wolf]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[DistilBERT, a distilled version of bert: smaller, faster, cheaper and lighter]]></source>
<year>2019</year>
<publisher-name><![CDATA[arXiv]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sellars]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Twenty years of web scraping and the computer fraud and abuse act]]></article-title>
<source><![CDATA[BU Journal of Science and Technology Law]]></source>
<year>2018</year>
<volume>24</volume>
<page-range>372</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="">
<collab>Sitelabs</collab>
<source><![CDATA[Web scraping: Introduccion y herramientas]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Srinilta]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Sunhem]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Tungjitnob]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Thasanthiah]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Lyric-based sentiment polarity classification of thai songs]]></source>
<year>2017</year>
<volume>1</volume>
<conf-name><![CDATA[ International MultiConference of Engineers and Computer Scientists]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tula]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Potluri]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Ms]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Doddapaneni]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sahu]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Sukumaran]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Patwa]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Bitions @ DravidianLangTech-EACL2021: Ensemble of multilingual language models with pseudo labeling for offence detection in dravidian languages]]></source>
<year>2021</year>
<conf-name><![CDATA[ First Workshop on Speech and Language Technologies for Dravidian Languages]]></conf-name>
<conf-loc> </conf-loc>
<page-range>291-9</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Dong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Review of data features-based music emotion recognition methods]]></article-title>
<source><![CDATA[Multimedia Systems]]></source>
<year>2017</year>
<volume>24</volume>
<page-range>365-89</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
