SciELO - Scientific Electronic Library Online

 
vol.18 número3SIMTEX: An Approach for Detecting and Measuring Textual Similarity based on Discourse and SemanticsParaphrase and Textual Entailment Generation in Czech índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Computación y Sistemas

versão On-line ISSN 2007-9737versão impressa ISSN 1405-5546

Resumo

CALVO, Hiram; SEGURA-OLIVARES, Andrea  e  GARCIA, Alejandro. Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition. Comp. y Sist. [online]. 2014, vol.18, n.3, pp.517-554. ISSN 2007-9737.  https://doi.org/10.13053/CyS-18-3-2044.

Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this problem, several lexical, syntactic and semantic based techniques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntactic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures, and linear combination and a similarity matrix built in turn from syntactic n-grams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each approach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syntactic dependency n-grams, or syntactic constituent n-grams. We compare too both approaches with traditional n-grams and state-of-the-art systems.

Palavras-chave : Paraphrase recognition; Microsoft Research paraphrase corpus; similarity measures; syntactic n-grams; constituent analysis; dependency analysis.

        · texto em Inglês     · Inglês ( pdf )

 

Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons