SciELO - Scientific Electronic Library Online

 
 issue47Automatic WordNet Construction Using Markov Chain Monte CarloTopicSearch-Personalized Web Clustering Engine Using Semantic Query Expansion, Memetic Algorithms and Intelligent Agents author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Abstract

GU, Yanhui; YANG, Zhenglu; NAKANO, Miyuki  and  KITSUREGAWA, Masaru. Exploration on Effectiveness and Efficiency of Similar Sentence Matching. Polibits [online]. 2013, n.47, pp.23-29. ISSN 1870-9044.

Similar sentence matching is an essential issue for many applications, such as text summarization, image extraction, social media retrieval, question-answer model, and so on. A number of studies have investigated this issue in recent years. Most of such techniques focus on effectiveness issues but only a few focus on efficiency issues. In this paper, we address both effectiveness and efficiency in the sentence similarity matching. For a given sentence collection, we determine how to effectively and efficiently identify the top-k semantically similar sentences to a query. To achieve this goal, we first study several representative sentence similarity measurement strategies, based on which we deliberately choose the optimal ones through cross-validation and dynamically weight tuning. The experimental evaluation demonstrates the effectiveness of our strategy. Moreover, from the efficiency aspect, we introduce several optimization techniques to improve the performance of the similarity computation. The trade-off between the effectiveness and efficiency is further explored by conducting extensive experiments.

Keywords : String matching; information retrieval; natural language processing.

        · text in English     · English ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License