SciELO - Scientific Electronic Library Online

 
 issue40Cross Language Information Retrieval using Multilingual Ontology as Translation and Query Expansion Base author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Abstract

TOMAS, David; VICEDO, José L.; BISBAL, Empar  and  MORENO, Lidia. TrainQA: a Training Corpus for Corpus-Based Question Answering Systems. Polibits [online]. 2009, n.40, pp.5-11. ISSN 1870-9044.

This paper describes the development of an English corpus of factoid TREC-like question-answer pairs. The corpus obtained consists of more than 70,000 samples, containing each one the following information: a question, its question type, an exact answer to the question, the different contexts levels (sentence, paragraph and document) where the answer occurs inside a document, and a label indicating whether the answer is correct (a positive sample) or not (a negative sample). For instance, TrainQA can be used for training a binary classifier in order to decide if a given answer is correct (positive) to the question formulated or not (negative). To our knowledge, this is the first corpus aimed to train on every stage of a trainable Question Answering system: question classification, information retrieval, answer extraction and answer validation.

Keywords : Question answering; corpus-based systems.

        · text in English     · English ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License