versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
Semantic Aspect Retrieval for Encyclopedia
Chao Han, Yicheng Liu, Yu Hao, and Xiaoyan Zhu
Department of Computer Science and Technology, Tsinghua University, China (email: firstname.lastname@example.org).
Manuscript received November 1, 2010.
Manuscript accepted for publication December 21, 2010.
With the development of Web 2.0, more and more people contribute their knowledge to the Internet. Many general and domainspecific online encyclopedia resources become available, and they are valuable for many Natural Language Processing (NLP) applications, such as summarization and questionanswering. We propose a novel encyclopediaspecific method to retrieve passages which are semantically related to a short query (usually comprises of only one word/phrase) from a given article in the encyclopedia. The method captures the expression word features and categorical word features in the surrounding snippets of the aspect words by setting up massive hybrid language models. These local models outperform the global models such as LSA and ESA in our task.
Key words: Aspect retrieval, online encyclopedia, semantic relatedness.
 J. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. R. Karger, "The role of context in question answering systems," in Proceedings of the 2003 Conference on Human Factors in Computing Systems, 2003. [ Links ]
 S. Ye, T. Chua and J. Lu, "Summarizing Definition from Wikipedia," in Proceedings of the 47th Annual Meeting of the ACL. Singapore, 2009. [ Links ]
 C. Li, N. Yan, S. B. Roy, L. Lisham and G. Das, "Facetedpedia: Dynamic Generation of Query Dependent Faceted Interfaces for Wikipedia," in Proceedings of International World Wide Web Conference, Raleigh, North Carolina, USA, 2010. [ Links ]
 R. Hahn, C. Bizer, C. Sahnwaldt, C. Herta, S. Robinson, M. Brgle, H. Dwiger, and U. Scheel, "Faceted Wikipedia Search," in 13th International Conference on Business Information Systems (BIS), 2010. [ Links ]
 R. B. Yates and B. R. Neto, Modern Information Retrieval, Addison Wesley, New York, NY. 1999. [ Links ]
 C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, Cambridge, MA. 1998. [ Links ]
 A. Budanitsky and G. Hirst, "Evaluating Wordnetbased Measures of Lexical Semantic Relatedness," Computational Linguistics, 2006, pp. 1347. [ Links ]
 P. Roget, Roget's Thesaurus of English Wordsand Phrases, Longman Group Ltd., 1852. [ Links ]
 S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer and R. Harshman, "Indexing by Latent Semantic Analysis," Journal of the American Society For Information Science, 1990, pp. 391407. [ Links ]
 E. Gabrilovich and S. Markovitch, "Computing Semantic Relatedness Using Wikipediabased Explicit Semantic Analysis," in Proceedings of IJCAI, 2007, pp. 16061611. [ Links ]
 E. Hatcher and O. Gospodnetic, Lucene in action, Manning Publications, 2005. [ Links ]
 J. M. Ponte, and W. B. Croft, "A Language Modeling Approach to Information Retrieval," in Proceedings of the 21st Intl. ACM SIGIR Conf., 1998, pp. 275281. [ Links ]