<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462024000402103</article-id>
<article-id pub-id-type="doi">10.13053/cys-28-4-5305</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Topic Modelling and Sentiment Analysis via News Headlines, NLP Methods on Australian Broadcasting Commission]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gouliev]]></surname>
<given-names><![CDATA[Zaur]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Pérez-Téllez]]></surname>
<given-names><![CDATA[Fernando]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Technological University Dublin School of Enterprise Computing, Digital and Data ]]></institution>
<addr-line><![CDATA[Dublin ]]></addr-line>
<country>Ireland</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2024</year>
</pub-date>
<volume>28</volume>
<numero>4</numero>
<fpage>2103</fpage>
<lpage>2115</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462024000402103&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462024000402103&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462024000402103&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: The main aim of this paper is to provide a holistic overview, implementation and comparison of some of the main supervised and unsupervised machine learning methods that are used in natural language processing for extracting topics and sentiment from headlines. This paper employs supervised learning methods such as logistic regression, supper vector machine classifier (SVM) and unsupervised learning methods such as K-means clustering and Latent Dirichlet allocation (LDA). To demonstrate these NLP applications, an extensive dataset of one million news headlines is used provided online by the Australian Broadcasting Commission which contains 17 years of news headlines, which provides for rich analysis. Our results show that logistic regression based models which use lexicon-based emotion classifiers score very highly in accuracy for sentiment analysis, reaching 93% and clustering-based techniques K-means scored 75% for topic modelling. An detailed explanation of these methods, along with limitations, assumptions, ethical considerations and suggestions of future work are discussed.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[News headlines]]></kwd>
<kwd lng="en"><![CDATA[machine learning]]></kwd>
<kwd lng="en"><![CDATA[natural language processing]]></kwd>
<kwd lng="en"><![CDATA[sentiment analysis]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alharbi]]></surname>
<given-names><![CDATA[A. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Hijji]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Aljaedi]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Enhancing topic clustering for arabic security news based on k-means and topic modelling]]></article-title>
<source><![CDATA[IET Networks]]></source>
<year>2021</year>
<volume>10</volume>
<numero>6</numero>
<issue>6</issue>
<page-range>278-94</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ali]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Omar]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Soulaimane]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Analyzing tourism reviews using an lda topic-based sentiment analysis approach]]></article-title>
<source><![CDATA[MethodsX]]></source>
<year>2022</year>
<volume>9</volume>
<page-range>101894</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chaganti]]></surname>
<given-names><![CDATA[S. Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Nanda]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Pandi]]></surname>
<given-names><![CDATA[K. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Prudhvith]]></surname>
<given-names><![CDATA[T. G.]]></given-names>
</name>
<name>
<surname><![CDATA[Kumar]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Image classification using SVM and CNN]]></source>
<year>2020</year>
<conf-name><![CDATA[ International Conference on Computer Science, Engineering and Applications]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-5</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chakraborty]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Bhatia]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Bhattacharyya]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Platos]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Bag]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Hassanien]]></surname>
<given-names><![CDATA[A. E.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Sentiment analysis of COVID-19 tweets by deep learning classifiers &#8212; A study to show how popularity is affecting accuracy in social media]]></article-title>
<source><![CDATA[Applied Soft Computing]]></source>
<year>2020</year>
<volume>97</volume>
<page-range>106754</page-range></nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<collab>Eurostat</collab>
<source><![CDATA[Consumption of online news rises in popularity]]></source>
<year>2022</year>
<publisher-name><![CDATA[Eurostat News]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hamborg]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Donnay]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Gipp]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Automated identification of media bias in news articles: An interdisciplinary literature review]]></article-title>
<source><![CDATA[International Journal on Digital Libraries]]></source>
<year>2018</year>
<volume>20</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>391-415</page-range></nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hammad]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Barhoush]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Abed-Alguni]]></surname>
<given-names><![CDATA[B. H.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A semantic-based approach for managing healthcare big data: A survey]]></article-title>
<source><![CDATA[Journal of Healthcare Engineering]]></source>
<year>2020</year>
<volume>2020</volume>
<page-range>1-12</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhi]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Dong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Dong]]></surname>
<given-names><![CDATA[Q.]]></given-names>
</name>
<name>
<surname><![CDATA[Shen]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Artificial intelligence in healthcare: Past, present and future]]></article-title>
<source><![CDATA[Stroke and Vascular Neurology]]></source>
<year>2017</year>
<volume>2</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>230-43</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Khurana]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Koli]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Khatter]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Natural language processing: State of the art, current trends and challenges]]></article-title>
<source><![CDATA[Multimedia Tools and Applications]]></source>
<year>2022</year>
<volume>82</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>3713-44</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kirill]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Mihail]]></surname>
<given-names><![CDATA[I. G.]]></given-names>
</name>
<name>
<surname><![CDATA[Sanzhar]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Rustam]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Olga]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Ravil]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Propaganda identification using topic modelling]]></article-title>
<source><![CDATA[Procedia Computer Science]]></source>
<year>2020</year>
<volume>178</volume>
<page-range>205-12</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kulkarni]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[A million news headlines]]></source>
<year>2022</year>
<publisher-name><![CDATA[Harvard Dataverse]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kurenkov]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Lessons from the PULSE model and discussion]]></source>
<year>2020</year>
<publisher-name><![CDATA[The Gradient]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Sentiment analysis and opinion mining]]></source>
<year>2012</year>
<publisher-name><![CDATA[Springer International Publishing]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mishev]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Gjorgjevikj]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Vodenska]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Chitkushev]]></surname>
<given-names><![CDATA[L. T.]]></given-names>
</name>
<name>
<surname><![CDATA[Trajanov]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Evaluation of sentiment analysis in finance: from lexicons to transformers]]></article-title>
<source><![CDATA[IEEE Access]]></source>
<year>2020</year>
<volume>8</volume>
<page-range>131662-82</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mohammad]]></surname>
<given-names><![CDATA[S. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Turney]]></surname>
<given-names><![CDATA[P. D.]]></given-names>
</name>
</person-group>
<source><![CDATA[NRC emotion lexicon]]></source>
<year>2013</year>
<publisher-name><![CDATA[National Research Council Canada Publications Record]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Muhammad]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Jailani]]></surname>
<given-names><![CDATA[N. A. J.]]></given-names>
</name>
<name>
<surname><![CDATA[Leh]]></surname>
<given-names><![CDATA[N. A. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Hamid]]></surname>
<given-names><![CDATA[S. A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Classification of drinking water quality using support vector machine (SVM) algorithm]]></source>
<year>2022</year>
<conf-name><![CDATA[ 12th International Conference on Control System, Computing and Engineering]]></conf-name>
<conf-loc> </conf-loc>
<page-range>75-80</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nzotta]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[A quick history of natural language processing]]></source>
<year>2023</year>
<publisher-name><![CDATA[Aveni]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pang]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Opinion mining and sentiment analysis]]></article-title>
<source><![CDATA[Foundations and Trends® in Information Retrieval]]></source>
<year>2008</year>
<volume>2</volume>
<numero>1&#8211;2</numero>
<issue>1&#8211;2</issue>
<page-range>1-135</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pang]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Vaithyanathan]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Thumbs up?: Sentiment classification using machine learning techniques]]></source>
<year>2002</year>
<volume>10</volume>
<conf-name><![CDATA[ Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc> </conf-loc>
<page-range>79-86</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<collab>Pew Research Center</collab>
<source><![CDATA[Social media use in 2021]]></source>
<year>2021</year>
<publisher-name><![CDATA[Pew Research Center]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ponn-Felciah]]></surname>
<given-names><![CDATA[M. L.]]></given-names>
</name>
<name>
<surname><![CDATA[Anbuselvi]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Smartphone product review sentiment analysis using logistic regression]]></article-title>
<source><![CDATA[International Journal of Circuit Theory and Applications]]></source>
<year>2016</year>
<volume>9</volume>
<numero>26</numero>
<issue>26</issue>
<page-range>343-9</page-range></nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sharma]]></surname>
<given-names><![CDATA[D. N.]]></given-names>
</name>
<name>
<surname><![CDATA[Shankar]]></surname>
<given-names><![CDATA[D. P.]]></given-names>
</name>
<name>
<surname><![CDATA[Raj]]></surname>
<given-names><![CDATA[M. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Dalwadi]]></surname>
<given-names><![CDATA[M. C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Sentiment analysis for amazon product reviews using logistic regression model]]></article-title>
<source><![CDATA[Journal of Development Economics and Management Research Studies]]></source>
<year>2022</year>
<volume>9</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>29-42</page-range></nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[N. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Tomar]]></surname>
<given-names><![CDATA[D. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sangaiah]]></surname>
<given-names><![CDATA[A. K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Sentiment analysis: A review and comparative analysis over social media]]></article-title>
<source><![CDATA[Journal of Ambient Intelligence and Humanized Computing]]></source>
<year>2018</year>
<volume>11</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>97-117</page-range></nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Valenti]]></surname>
<given-names><![CDATA[A. P.]]></given-names>
</name>
<name>
<surname><![CDATA[Chita-Tegmark]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Tickle-Degnen]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Bock]]></surname>
<given-names><![CDATA[A. W.]]></given-names>
</name>
<name>
<surname><![CDATA[Scheutz]]></surname>
<given-names><![CDATA[M. J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Using topic modeling to infer the emotional state of people living with parkinson&#8217;s disease]]></article-title>
<source><![CDATA[Assistive Technology]]></source>
<year>2019</year>
<volume>33</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>136-45</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
