Services on Demand
Journal
Article
Indicators
- Cited by SciELO
- Access statistics
Related links
- Similars in SciELO
Share
Polibits
On-line version ISSN 1870-9044
Polibits n.40 México Jul./Dec. 2009
Special section: Information Retrieval and Natural Language Processing
TrainQA: a Training Corpus for CorpusBased Question Answering Systems
David Tomás1, José L. Vicedo1, Empar Bisbal2, and Lidia Moreno2
1 Department of Software and Computing Systems, University of Alicante, Spain. (dtomas@dlsi.ua.es, vicedo@dlsi.ua.es)
2 Department of Information Systems and Computation, Technical University of Valencia, Spain. (ebisbal@dsic.upv.es, lmoreno@dsic.upv.es)
Manuscript received November 23, 2008.
Manuscript accepted for publication August 15, 2009.
Abstract
This paper describes the development of an English corpus of factoid TREClike questionanswer pairs. The corpus obtained consists of more than 70,000 samples, containing each one the following information: a question, its question type, an exact answer to the question, the different contexts levels (sentence, paragraph and document) where the answer occurs inside a document, and a label indicating whether the answer is correct (a positive sample) or not (a negative sample). For instance, TrainQA can be used for training a binary classifier in order to decide if a given answer is correct (positive) to the question formulated or not (negative). To our knowledge, this is the first corpus aimed to train on every stage of a trainable Question Answering system: question classification, information retrieval, answer extraction and answer validation.
Key words: Question answering, corpusbased systems.
DESCARGAR ARTÍCULO EN FORMATO PDF
ACKNOWLEDGEMENT
This work has been developed in the framework of the project CICYT R2D2 (TIC200307158C04).
REFERENCES
[1] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, "Building a large annotated corpus of english: The penn treebank," Computational Linguistics, vol. 19, no. 2, pp. 313330, 1994. [ Links ]
[2] D. Ravichandran, A. Ittycheriah, and S. Roukos, "Automatic derivation of surface text patterns for a maximum entropy based question answering system," in NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Morristown, NJ, USA: Association for Computational Linguistics, 2003, pp. 8587. [ Links ]
[3] R. Soricut and E. Brill, "Automatic question answering using the web: Beyond the factoid," Information Retrieval, vol. 9, no. 2, pp. 191206, 2006. [ Links ]
[4] E. Agichtein, S. Lawrence, and L. Gravano, "Learning search engine specific query transformations for question answering," in WWW '01: Proceedings of the 10th international conference on World Wide Web. New York, NY, USA: ACM, 2001, pp. 169178. [ Links ]
[5] R. D. Burke, K. J. Hammond, V. A. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Schoenberg, "Question answering from frequently asked question files: Experiences with the faq finder system," Chicago, IL, USA, Tech. Rep., 1997. [ Links ]
[6] A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal, "Bridging the lexical chasm: statistical approaches to answerfinding," in SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2000, pp. 192199. [ Links ]
[7] E. Bisbal, D. Tomas, L. Moreno, J. L. Vicedo, and A. Suarez, "A multilingual svmbased question classification system," in MICAI 2005: Advances in Artificial Intelligence, 4th Mexican International Conference on Artificial Intelligence, ser. Lecture Notes in Computer Science, A. F. Gelbukh, A. de Albornoz, and H. TerashimaMarín, Eds., vol. 3789. Springer, November 2005, pp. 806815. [ Links ]
[8] I. Dagan, O. Glickman, and B. Magnini, "Recognizing textual entailment," in PASCAL Proceedings of the First Challenge Workshop, Southampton, UK, April 2005, pp. 18. [ Links ]
[9] E. M. Voorhees, "The trec8 question answering track report," in Eighth Text REtrieval Conference, ser. NIST Special Publication, vol. 500246. Gaithersburg, USA: National Institute of Standards and Technology, November 1999, pp. 7782. [ Links ]
[10] J. C. Reynar and A. Ratnaparkhi, "A maximum entropy approach to identifying sentence boundaries," in Proceedings of the fifth conference on Applied natural language processing. Morristown, NJ, USA: Association for Computational Linguistics, 1997, pp. 1619. [ Links ]
[11] J. L. Fleiss, "Measuring nominal scale agreement among many raters," Psychological Bulletin, vol. 76, no. 5, pp. 378382, 1971. [ Links ]