SciELO - Scientific Electronic Library Online

 
 número37Iterative Feedback Based Manifold-Ranking for Update SummaryWeb-based Bengali News Corpus for Lexicon Development and POS Tagging índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Polibits

versión On-line ISSN 1870-9044

Polibits  no.37 México ene./jun. 2008

 

Special section: natural language processing

 

Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases

 

Ranka M. Stanković

 

Faculty of Mining and Geology, University of Belgrade, Dusina 7, 11000 Belgrade, Serbia (phone: +381 11 3219–148; fax: +381 11 3243 978; e–mail: ranka@rgf.bg.ac.yu).

 

Manuscript received on May 9, 2008.
Manuscript accepted for publication June 20, 2008.

 

Abstract

The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly inflective languages, such as Serbian. This paper discusses issues related to improvement of queries using a rule based procedure implemented in WS4LR, a workstation for manipulating heterogeneous lexical resources developed by the Human Language Technology Group at the University of Belgrade. The procedure is used for automatic production of lemmas for a morphological dictionary from a given list of compounds, and its evaluation on several different sets of data is given. Several examples illustrate how this procedure can be used for improvement of queries for web search engines. Results obtained for these examples show that the number of documents obtained through a query by using our approach can be remarkably increased.

Key words: Electronic dictionary, inflection, compounds, query expansion.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] Krstev, C., Stanković, R., Vitas, D., Obradović, I. (2006). "WS4LR: A Workstation for Lexical Resources". In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 2006, pp. 1692–1697.         [ Links ]

[2] Gelbukh, A., Sidorov G. "Approach to construction of automatic morphological analysis systems for inflective languages with little effort". LNCS 2588, 2003, pp. 215–220.         [ Links ]

[3] Courtois, B., Silberztein, M. (eds.): Dictionnaires électroniques du français. Langue française. 87, Larousse, Paris, 1990.         [ Links ]

[4] Krstev C.: Processing of Serbian — Automata, Texts and Electronic Dictionarie. Faculty of Philology, University of Belgrade, Belgrade, 2008.         [ Links ]

[5] Savary, A., Krstev, C., Vitas, D.: "Inflectional non compositionality and variation of compounds in French, Polish and Serbian, and their automatic processing". Bulag – Bulletin de Linguistique Appliquée et Générale. 32, 73–94, 2007.         [ Links ]

[6] Krstev, C., Vitas, D., Savary, A.: "Prerequisites for a Comprehensive Dictionary of Serbian Compounds". In: Salakosi, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNAI, vol. 4139, pp. 552––564. Springer, Heidelberg, 2006.         [ Links ]

[7] Krstev, C. Stanković, R., Vitas, D., Obradović, I..: "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines". In: 6th LREC International Conference on Language Resources and Evaluation, Marrakech, Marocco, 2008.         [ Links ]

[8] Krstev C., Pavlović–Lažetić G., Vitas D., Obradović I.: "Using Textual and Lexical Resources in Developing Serbian Wordnet." In Romanian Journal of Information Science and Technology, Romanian Academy, Publishing House of the Romanian Academy, vol. 7, No. 1–2, pp. 147–161, (2004).         [ Links ]

[9] Krstev, C., Vitas, D., Maurel, D., Tran, M. (2005). "Multilingual Ontology of Proper Names". In Proc. of Second Language & Technology Conference, Poznan, Poland, April 21–23, Wydawnictwo Poznanskie Sp. z o.o, Poznan.         [ Links ]

[10] TMX 1.4b specification, http://www.lisa.org/standards/tmx/tmx.html        [ Links ]

 

NOTE

The presented work was done within the Human Language Technology group, University of Belgrade, Serbia.

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons