<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-7743</journal-id>
<journal-title><![CDATA[Ingeniería, investigación y tecnología]]></journal-title>
<abbrev-journal-title><![CDATA[Ing. invest. y tecnol.]]></abbrev-journal-title>
<issn>1405-7743</issn>
<publisher>
<publisher-name><![CDATA[Universidad Nacional Autónoma de México, Facultad de Ingeniería]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-77432020000300002</article-id>
<article-id pub-id-type="doi">10.22201/fi.25940732e.2020.21.3.022</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Aplicación de algoritmos Random Forest y XGBoost en una base de solicitudes de tarjetas de crédito]]></article-title>
<article-title xml:lang="en"><![CDATA[Application of Random Forest and XGBoost algorithms based on a credit card applications database]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Espinosa-Zúñiga]]></surname>
<given-names><![CDATA[Javier Jesús]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Grupo Financiero Ve por Más S.A. de C.V. Gerencia CRM ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>México</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2020</year>
</pub-date>
<volume>21</volume>
<numero>3</numero>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-77432020000300002&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-77432020000300002&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-77432020000300002&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[Resumen: Dentro de la gama de algoritmos de aprendizaje automático existentes destacan actualmente dos: Random Forest y XGBoost. Ambos han adquirido gran popularidad. Random Forest es un algoritmo que surgió hace casi veinte años y se utiliza ampliamente por el balance que ofrece entre complejidad y resultados. Por su parte, XGBoost es un algoritmo que ha despertado gran interés, pues aunque es relativamente reciente es considerado actualmente el estado del arte en algoritmos de aprendizaje automático por sus resultados. Uno de los sectores en los que se aplican este tipo de algoritmos es el financiero. Algunos ejemplos de su aplicación en este sector son: segmentación de clientes, detección de fraudes, pronóstico de ventas, autenticación de clientes y análisis de comportamiento de mercados, entre otros. Un área de particular interés en este sector es la identificación de clientes a quienes otorgar una tarjeta de crédito, esto es crítico para las instituciones financieras, pues una selección incorrecta de estos clientes podría derivar en un incremento de su cartera vencida. En el presente estudio se aplicaron los algoritmos Random Forest y XGBoost sobre una base de solicitudes de tarjetas de crédito (donada por un banco australiano para fines de investigación) para identificar las solicitudes con mayor probabilidad de otorgarles una tarjeta. Los modelos obtenidos se compararon estadísticamente (donde se seleccionó el modelo con el algoritmo XGBoost) y se presentaron los resultados con gráficas que permiten responder dos preguntas clave desde el enfoque de negocio: ¿Cuáles son las solicitudes a las que hay que otorgar una tarjeta? y ¿Qué resultados esperamos en caso de aplicar el modelo? La aportación más importante del presente estudio es aplicar dos algoritmos muy efectivos sobre esta base de solicitudes de tarjetas de crédito con un enfoque de negocios.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Two of the existing machine learning algorithms currently stand out: Random Forest and XGBoost. Both have become very popular. Random Forest is an algorithm that emerged almost twenty years ago and is widely used for the balance it offers between complexity and results. On the other hand, XGBoost is an algorithm that has aroused great interest because although it is relatively recent, it is currently considered the state of the art in machine learning algorithms for its results. One of the sectors in which this type of algorithm is applied is the financial. Some examples of its application in this sector are: customer segmentation, fraud detection, sales forecasting, customer authentication and market behavior analysis. An area of particular interest in this sector is the identification of clients to whom to grant a credit card: this is critical for financial institutions since an incorrect selection of these clients could lead to an increase in their past due portfolio. In the present study the Random Forest and XGBoost algorithms were applied on a credit card application database (donated by an Australian bank for research purposes) to identify the applications most likely to be granted a credit card. The models obtained were compared statistically (from which the model obtained with the XGBoost algorithm was selected) and the results were presented with graphs that allow answering two key questions from the business perspective: what are the requests to which a card must be awarded? and what results do we expect if the model is applied? The most important contribution of the present study is to apply two very effective algorithms on this database with a business focus.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Aprendizaje automático]]></kwd>
<kwd lng="es"><![CDATA[XGBoost]]></kwd>
<kwd lng="es"><![CDATA[Random Forest]]></kwd>
<kwd lng="es"><![CDATA[árbol de decisión]]></kwd>
<kwd lng="es"><![CDATA[hiperparámetro]]></kwd>
<kwd lng="en"><![CDATA[Machine Learning]]></kwd>
<kwd lng="en"><![CDATA[XGBoost]]></kwd>
<kwd lng="en"><![CDATA[Random Forest]]></kwd>
<kwd lng="en"><![CDATA[decision tree]]></kwd>
<kwd lng="en"><![CDATA[hyper parameter]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bahador]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Movahedi]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Taghipour]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Derrible]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Mohammadian]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis]]></article-title>
<source><![CDATA[Accident Analysis &amp; Prevention]]></source>
<year>2020</year>
<volume>136</volume>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Breiman]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Random Forests]]></article-title>
<source><![CDATA[Machine Learning]]></source>
<year>2001</year>
<volume>45</volume>
<page-range>5-32</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cánovas]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Alonso]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Gomariz]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Oñate]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery]]></article-title>
<source><![CDATA[Computers &amp; Geosciences]]></source>
<year>2017</year>
<volume>103</volume>
<page-range>1-11</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Guestrin]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[XGBoost: A scalable tree boosting system.]]></article-title>
<source><![CDATA[proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data 785-794. KDD '16]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<collab>CRAN</collab>
<source><![CDATA[The comprehensive R archive network]]></source>
<year>2019</year>
<publisher-name><![CDATA[The Comprehensive R Archive Network]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Fawcett]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[An introduction to ROC analysis]]></article-title>
<source><![CDATA[Pattern Recognition Letters]]></source>
<year>2005</year>
<volume>27</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>861-74</page-range></nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Huo]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Bum]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Tsui]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[FBP: A Frontier-Based tree-pruning algorithm]]></article-title>
<source><![CDATA[INFORMS Journal of Computing]]></source>
<year>2006</year>
<volume>18</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>407-530</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lizares]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[]]></source>
<year>2017</year>
<publisher-name><![CDATA[Universidad Nacional Mayor de San Marcos]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Luckner]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Topolski]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Mazurek]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Application of XGBoost algorithm in fingerprinting localisation task]]></article-title>
<source><![CDATA[16th IFIP TC8 International Conference, CISIM 2017]]></source>
<year>2017</year>
<page-range>661-71</page-range><publisher-loc><![CDATA[Bialystok, Poland ]]></publisher-loc>
<publisher-name><![CDATA[CISIM]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nobre]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferreira]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>2019</year>
<volume>125</volume>
<page-range>181-94</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Quinlan]]></surname>
<given-names><![CDATA[J. R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Simplifying decision trees]]></article-title>
<source><![CDATA[International Journal of Man-Machine Studies]]></source>
<year>1987</year>
<volume>27</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>221-34</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sandoval]]></surname>
<given-names><![CDATA[L. L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Machine Learning algorythms for analysis and data prediction]]></article-title>
<source><![CDATA[2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII)]]></source>
<year>2017</year>
<page-range>1-5</page-range><publisher-loc><![CDATA[Managua, Nicaragua ]]></publisher-loc>
<publisher-name><![CDATA[IEEE]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tumer]]></surname>
</name>
<name>
<surname><![CDATA[Ghosh]]></surname>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Robust combining of disparate classifiers through order statistics]]></article-title>
<source><![CDATA[Pattern Anal Appl]]></source>
<year>2002</year>
<volume>5</volume>
<page-range>189-200</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="book">
<collab>UCI</collab>
<source><![CDATA[UCI Machine learning repository]]></source>
<year>2020</year>
<publisher-name><![CDATA[Center for Machine Learning and Intelligent System]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="">
<collab>Universidad de California</collab>
<source><![CDATA[]]></source>
<year>2019</year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
