<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1405-5546</journal-id>
<journal-title><![CDATA[Computación y Sistemas]]></journal-title>
<abbrev-journal-title><![CDATA[Comp. y Sist.]]></abbrev-journal-title>
<issn>1405-5546</issn>
<publisher>
<publisher-name><![CDATA[Instituto Politécnico Nacional, Centro de Investigación en Computación]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1405-55462020000200669</article-id>
<article-id pub-id-type="doi">10.13053/cys-24-2-3401</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ketu]]></surname>
<given-names><![CDATA[Shwet]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Mishra]]></surname>
<given-names><![CDATA[Pramod Kumar]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[Sonali]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Banaras Hindu University Institute of Science Department of Computer Science]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>India</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Indian Institute of Information and Technology  ]]></institution>
<addr-line><![CDATA[Allahabad ]]></addr-line>
<country>India</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<volume>24</volume>
<numero>2</numero>
<fpage>669</fpage>
<lpage>686</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_arttext&amp;pid=S1405-55462020000200669&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_abstract&amp;pid=S1405-55462020000200669&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.mx/scielo.php?script=sci_pdf&amp;pid=S1405-55462020000200669&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: In the last one decade, the tremendous growth in data emphasizes big data storage and management issues with the highest priorities. For providing better support to software developers for dealing with big data problems, new programming platforms are continuously developing and Hadoop MapReduce is a big game-changer followed by Spark, which sets the world of big data on fire with its processing speed and comfortable APIs. Hadoop framework emerged as a leading tool based on the MapReduce programming model with a distributed file system. Spark is on the other hand, recently developed big data analysis and management framework used to explore unlimited underlying features of Big Data. In this research work, a comparative analysis of Hadoop MapReduce and Spark has been presented based on working principle, performance, cost, ease of use, compatibility, data processing, failure tolerance, and security. Experimental analysis has been performed to observe the performance of Hadoop MapReduce and Spark for establishing their suitability under different constraints of the distributed computing environment.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Big data]]></kwd>
<kwd lng="en"><![CDATA[parallel processing]]></kwd>
<kwd lng="en"><![CDATA[distributed environments]]></kwd>
<kwd lng="en"><![CDATA[distributed frameworks]]></kwd>
<kwd lng="en"><![CDATA[Hadoop MapReduce]]></kwd>
<kwd lng="en"><![CDATA[Spark]]></kwd>
<kwd lng="en"><![CDATA[big data analytics]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jacobs]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The pathologies of big data]]></article-title>
<source><![CDATA[Communications of the ACM]]></source>
<year>2009</year>
<volume>52</volume>
<numero>8</numero>
<issue>8</issue>
<page-range>36-44</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zikopoulos]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Eaton]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Understanding big data: Analytics for enterprise class Hadoop and streaming data]]></source>
<year>2011</year>
<publisher-name><![CDATA[McGraw-Hill Osborne Media]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kaisler]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Armour]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Espinosa]]></surname>
<given-names><![CDATA[J.A.]]></given-names>
</name>
<name>
<surname><![CDATA[Money]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
</person-group>
<source><![CDATA[Big data: Issues and challenges moving forward]]></source>
<year>2013</year>
<conf-name><![CDATA[ 46th Hawaii International Conference on System Sciences]]></conf-name>
<conf-loc> </conf-loc>
<page-range>995-1004</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="">
<collab>Spark</collab>
<source><![CDATA[]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhong]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[He]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<source><![CDATA[August. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS]]></source>
<year>2009</year>
<conf-name><![CDATA[ IEEE International Conference on Cluster Computing and Workshops]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-8</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="">
<collab>YARN</collab>
<source><![CDATA[]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Song]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[The optimization of HDFS based on small files]]></source>
<year>2010</year>
<conf-name><![CDATA[ 3rd IEEE international conference on broadband network and multimedia technology (IC-BNMT)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>912-5</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mackey]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Sehrish]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Improving metadata management for small files in HDFS]]></source>
<year>2009</year>
<conf-name><![CDATA[ IEEE international conference on cluster computing and workshops]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-4</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xie]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Yin]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ruan]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Ding]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Tian]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Majors]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Manzanares]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Qin]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<source><![CDATA[Improving mapreduce performance through data placement in heterogeneous Hadoop clusters]]></source>
<year>2010</year>
<conf-name><![CDATA[ Processing, Workshops and Phd Forum (IPDPSW)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-9</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Thanh]]></surname>
<given-names><![CDATA[T.D.]]></given-names>
</name>
<name>
<surname><![CDATA[Mohan]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Choi]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[A taxonomy and survey on distributed file systems]]></source>
<year>2008</year>
<volume>1</volume>
<conf-name><![CDATA[ Fourth International Conference on Networked Computing and Advanced Information Management]]></conf-name>
<conf-loc> </conf-loc>
<page-range>144-9</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zaharia]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Chowdhury]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Franklin]]></surname>
<given-names><![CDATA[M.J.]]></given-names>
</name>
<name>
<surname><![CDATA[Shenker]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Stoica]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Spark: Cluster computing with working sets]]></source>
<year>2010</year>
<volume>10</volume>
<conf-name><![CDATA[ HotCloud]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-7</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cito]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Leitner]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Fritz]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Gall]]></surname>
<given-names><![CDATA[H.C.]]></given-names>
</name>
</person-group>
<source><![CDATA[The making of cloud applications: An empirical study on software development for the cloud]]></source>
<year>2015</year>
<conf-name><![CDATA[ 2015 10th Joint Meeting on Foundations of Software Engineering]]></conf-name>
<conf-loc> </conf-loc>
<page-range>393-403</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cavallaro]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Riedel]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Benediktsson]]></surname>
<given-names><![CDATA[J.A.]]></given-names>
</name>
<name>
<surname><![CDATA[Goetz]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Runarsson]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Jonasson]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Lippert]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Smart data analytics methods for remote sensing applications]]></source>
<year>2014</year>
<conf-name><![CDATA[ IEEE geoscience and remote sensing symposium]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1405-8</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zaharia]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Chowdhury]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Dave]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[McCauley]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Franklin]]></surname>
<given-names><![CDATA[M.J.]]></given-names>
</name>
<name>
<surname><![CDATA[Shenker]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Stoica]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing]]></source>
<year>2012</year>
<conf-name><![CDATA[ 9th USENIX conference on Networked Systems Design and Implementation]]></conf-name>
<conf-loc> </conf-loc>
<page-range>2</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vavilapalli]]></surname>
<given-names><![CDATA[V.K.]]></given-names>
</name>
<name>
<surname><![CDATA[Murthy]]></surname>
<given-names><![CDATA[A.C.]]></given-names>
</name>
<name>
<surname><![CDATA[Douglas]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Konar]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Evans]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Graves]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Lowe]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Shah]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Seth]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Saha]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Apache Hadoop Yarn: Yet another resource negotiator]]></source>
<year>2013</year>
<numero>5</numero>
<conf-name><![CDATA[ 4th Annual Symposium on Cloud Computing]]></conf-name>
<conf-loc> </conf-loc>
<issue>5</issue>
<page-range>1-16</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ketu]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Prasad]]></surname>
<given-names><![CDATA[B.R.]]></given-names>
</name>
<name>
<surname><![CDATA[Agarwal]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Effect of corpus size selection on performance of map-reduce based distributed k-means for big textual data clustering]]></source>
<year>2015</year>
<conf-name><![CDATA[ Sixth International Conference on Computer and Communication Technology]]></conf-name>
<conf-loc> </conf-loc>
<page-range>256-60</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nandimath]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Banerjee]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Patil]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kakade]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Vaidya]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Chaturvedi]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Big data analysis using Apache Hadoop]]></source>
<year>2013</year>
<conf-name><![CDATA[ IEEE 14th International Conference on Information Reuse &amp; Integration (IRI)]]></conf-name>
<conf-loc> </conf-loc>
<page-range>700-3</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gu]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[Memory or time: Performance evaluation for iterative operation on Hadoop and spark]]></source>
<year>2013</year>
<conf-name><![CDATA[ IEEE 10th International Conference on High Performance Computing and Communications]]></conf-name>
<conf-loc> </conf-loc>
<page-range>721-7</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Mao]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Big data: A survey]]></article-title>
<source><![CDATA[Mobile networks and applications]]></source>
<year>2014</year>
<volume>19</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>171-209</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shinnar]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Cunningham]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Saraswat]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Herta]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[M3R: increased performance for in-memory Hadoop jobs]]></article-title>
<source><![CDATA[Proceedings of the VLDB Endowment]]></source>
<year>2012</year>
<volume>5</volume>
<numero>12</numero>
<issue>12</issue>
<page-range>1736-47</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Thusoo]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Sarma]]></surname>
<given-names><![CDATA[J.S.]]></given-names>
</name>
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Shao]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Chakka]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Anthony]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Wyckoff]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Murthy]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Hive: a warehousing solution over a map-reduce framework]]></article-title>
<source><![CDATA[Proceedings of the VLDB Endowment]]></source>
<year>2009</year>
<volume>2</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>1626-9</page-range></nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xin]]></surname>
<given-names><![CDATA[R.S.]]></given-names>
</name>
<name>
<surname><![CDATA[Rosen]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Zaharia]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Franklin]]></surname>
<given-names><![CDATA[M.J.]]></given-names>
</name>
<name>
<surname><![CDATA[Shenker]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Stoica]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Shark: SQL and rich analytics at scale]]></source>
<year>2013</year>
<conf-name><![CDATA[ ACM SIGMOD International Conference on Management of data]]></conf-name>
<conf-loc> </conf-loc>
<page-range>13-24</page-range></nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zaharia]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Das]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Hunter]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Shenker]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Stoica]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
</person-group>
<source><![CDATA[Discretized streams: Fault-tolerant streaming computation at scale]]></source>
<year>2013</year>
<conf-name><![CDATA[ Proceedings of the twenty-fourth ACM symposium on operating systems principles]]></conf-name>
<conf-loc> </conf-loc>
<page-range>423-38</page-range></nlm-citation>
</ref>
<ref id="B24">
<label>24</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Marz]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Xu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Jackson]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Storm]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B25">
<label>25</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Garg]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Apache Kafka]]></source>
<year>2013</year>
<publisher-name><![CDATA[Packt Publishing Ltd]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B26">
<label>26</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Owen]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Anil]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Dunning]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Friedman]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Mahout in action]]></source>
<year>2011</year>
<publisher-name><![CDATA[Manning Publications]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B27">
<label>27</label><nlm-citation citation-type="">
<collab>WIKIPEDIA</collab>
<source><![CDATA[]]></source>
<year>2013</year>
</nlm-citation>
</ref>
<ref id="B28">
<label>28</label><nlm-citation citation-type="">
<collab>CMU</collab>
<source><![CDATA[]]></source>
<year>2012</year>
</nlm-citation>
</ref>
<ref id="B29">
<label>29</label><nlm-citation citation-type="">
<collab>DBPEDIA.ORG</collab>
<source><![CDATA[]]></source>
<year>2014</year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
