versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Are my Children Old Enough to Read these Books? Age Suitability Analysis
Franz Wanner*, Johannes Fuchs**, Daniela Oelke***, and Daniel A. Keim****
The authors are with the University of Konstanz, 78457 Konstanz, Germany (email: *firstname.lastname@example.org, **Johannes.Fuchs@unikonstanz.de, ***email@example.com, ****firstname.lastname@example.org).
Manuscript received October 27, 2010.
Manuscript accepted for publication January 28, 2011.
In general, books are not appropriate for all ages, so the aim of this work was to find an effective method of representing the age suitability of textual documents, making use of automatic analysis and visualization. Interviews with experts identified possible aspects of a text (such as 'is it hard to read?') and a set of features were devised (such as linguistic complexity, story complexity, genre) which combine to characterize these age related aspects. In order to measure these properties, we map a set of text features onto each one. An evaluation of the measures, using Amazon Mechanical Turk, showed promising results. Finally, the set features are visualized in our agesuitability tool, which gives the user the possibility to explore the results, supporting transparency and traceability as well as the opportunity to deal with the limitations of automatic methods and computability issues.
Key words: Information interfaces and presentation, information search and retrieval.
 B. Pang and L. Lee, "Opinion Mining and Sentiment Analysis," Foundations and Trends in Information Retrieval, vol. 2, no. 12, pp. 1135, 2008. [ Links ]
 J. Kadhim and V. Crittenden, "Amazon Mechanical Turk," retrieved from Citeseer. [ Links ]
 B. Pang and L. Lee, "Opinion mining and sentiment analysis," Found. Trends Inf. Retr., vol. 2, pp. 1135, January 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1454711.1454712. [ Links ]
 J. Zhang, Y. Kawai, T. Kumamoto, and K. Tanaka, "A novel visualization method for distinction of web news sentiment," in Web Information Systems Engineering WISE 2009, ser. Lecture Notes in Computer Science, G. Vossen, D. Long, and J. Yu, Eds. Springer Berlin / Heidelberg, 2009, vol. 5802, pp. 181194. [ Links ]
 M. L. Gregory, N. Chinchor, P. Whitney, R. Carter, E. Hetzler, and A. Turner, "Userdirected sentiment analysis: visualizing the affective content of documents," in Proceedings of the Workshop on Sentiment and Subjectivity in Text, ser. SST '06, 2006, pp. 2330. [ Links ]
 R. Nallapati, "Semantic language models for topic detection and tracking," in NAACLstudent '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2003, pp. 16. [ Links ]
 S. Green, "Building hypertext links in newspaper articles using semantic similarity," in Third Workshop on Applications of Natural Language to Information Systems (NLDB'97), 1997, pp. 178190. [ Links ]
 S. Scott and S. Matwin, "Text classification using WordNet hypernyms," in Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference, 1998, pp. 3844. [ Links ]
 A. Hotho, S. Staab, and G. Stumme, "Wordnet improves text document clustering," in Proc. of the SIGIR 2003 Semantic Web Workshop. Citeseer, 2003, pp. 541544. [ Links ]
 J. Allan, Ed., Topic detection and tracking: eventbased information organization. Norwell, MA, USA: Kluwer Academic Publishers, 2002. [ Links ]
 D. Oelke, D. Spretke, A. Stoffel, and D. A. Keim, "Visual readability analysis: How to make your writings easier to read," in Proceedings of IEEE Conference on Visual Analytics Science and Technology (VAST '10), 2010. [ Links ]
 R. Gunning, The technique of clear writing. McGrawHill, 1952. [ Links ]
 J. P. Kincaid, R. P. Fishburn, R. L. Rogers, and B. S. Chissom, "Derivation of New Readability Formulas for Navy Enlisted Personnel," Naval Air Station Memphis, Research Branch Report 875, 1975. [ Links ]
 D. I. Holmes, "Authorship Attribution," Computers and the Humanities, vol. 28, pp. 87106, 1994. [ Links ]
 D. Hoover, "Another perspective on vocabulary richness," Computers and the Humanities, vol. 37, pp. 151178, 2003. [ Links ]
 D. Oelke, "Visual document analysis: Towards a semantic analysis of large document collections," Ph.D. dissertation, University of Konstanz, 2010. [ Links ]
 D. A. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, Mastering the information age—solving problems with visual analytics. Eurographics Association, 2010. [ Links ]
 D. A. Keim and D. Oelke, "Literature fingerprinting: A new method for visual literary analysis," in Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology (VAST 07). IEEE Computer Society, 2007, pp. 115122. [ Links ]
 F. B. Viegas and M. Wattenberg, "Timelines tag clouds and the case for vernacular visualization," interactions, vol. 15, no. 4, pp. 4952, 2008. [ Links ]
 F. B. Viegas, M. Wattenberg, F. van Ham, J. Kriss, and M. McKeon, "Manyeyes: a site for visualization at internet scale," IEEE Transactions on Visualization and Computer Graphics, vol. 13, pp. 11211128, 2007. [ Links ]
 F. B. Viegas, M. Wattenberg, and J. Feinberg, "Participatory visualization with wordle." IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6, pp. 11371144, 2009. [ Links ]
 F. Wanner, M. Schaefer, F. LeitnerFischer, F. Zintgraf, M. Atkinson, and D. A. Keim, "Dynevi dynamic news entity visualization," in Proceedings of the International Symposium on Visual Analytics Science and Technology (EuroVAST2010), J. Kohlhammer and D. A. Keim, Eds., Jun. 2010, pp. 6974. [ Links ]
 R. Vuillemot, T. Clement, C. Plaisant, and A. Kumar, "What's Being Said Near "Martha"? Exploring Name Entities in Literary Text Collections," in IEEE Symposium on Visual Analytics Science and Technology (IEEE VAST), Oct. 2009, pp. 107114. [Online]. Available: http://liris.cnrs.fr/publis/?id=4360. [ Links ]
 R. Senter and E. Smith, "Automated Readability Index," 1997, technical Report. [ Links ]
 D. Klein, J. Smarr, H. Nguyen, and C. Manning, "Named entity recognition with characterlevel models," in Proceedings of the seventh conference on Natural language learning at HLTNAACL, 2003, pp. 180183. [ Links ]
 Ubiquitous Knowledge Processing (UKP) Lab, TU Darmstadt, English communication verbs, http://www.ukp.tudarmstadt.de/fileadrnin/user_upload/Group_UKP/data/english_commurucation_verbs.txt. [ Links ]
 H. L. Chieu and H. T. Ng, "Named entity recognition: a maximum entropy approach using global information," in Proceedings of the 19th international conference on Computational linguistics, 2002, pp. 17. [ Links ]
 C. Fellbaum, WordNet: An electronic lexical database. MIT Press, 1998. [ Links ]
 "Frequency list from the brown corpus, http://www.edict.com.hk/lexiconindex/frequencylists/words2000.htm. [ Links ]"
 C. John, "Emotionality ratings and freeassociation norms of 240 emotional and nonemotional words," Cognition & Emotion, vol. 2, no. 1, pp. 4970, 1988. [ Links ]
 C. Ware, Information Visualization: Perception for Design. Morgan Kaufmann Publishers, 2004. [ Links ]
 D. Oelke, M. Hao, C. Rohrdantz, D. Keim, U. Dayal, L. Haug, and H. Janetzko, "Visual opinion analysis of customer feedback data," in Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on. IEEE, 2009, pp. 187194. [ Links ]