Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Polibits
versión On-line ISSN 1870-9044
Polibits no.37 México ene./jun. 2008
Special section: natural language processing
Improvement of Queries using a Rule Based Procedure for Inflection of Compounds and Phrases
Ranka M. Stanković
Faculty of Mining and Geology, University of Belgrade, Dusina 7, 11000 Belgrade, Serbia (phone: +381 11 3219148; fax: +381 11 3243 978; email: ranka@rgf.bg.ac.yu).
Manuscript received on May 9, 2008.
Manuscript accepted for publication June 20, 2008.
Abstract
The selection of words chosen for a query, crucial for the quality of results obtained by the query, can be substantially improved by using various lexical resources. Thus, for example, morphological dictionaries enable morphological expansion of queries, which is very important in highly inflective languages, such as Serbian. This paper discusses issues related to improvement of queries using a rule based procedure implemented in WS4LR, a workstation for manipulating heterogeneous lexical resources developed by the Human Language Technology Group at the University of Belgrade. The procedure is used for automatic production of lemmas for a morphological dictionary from a given list of compounds, and its evaluation on several different sets of data is given. Several examples illustrate how this procedure can be used for improvement of queries for web search engines. Results obtained for these examples show that the number of documents obtained through a query by using our approach can be remarkably increased.
Key words: Electronic dictionary, inflection, compounds, query expansion.
DESCARGAR ARTÍCULO EN FORMATO PDF
REFERENCES
[1] Krstev, C., Stanković, R., Vitas, D., Obradović, I. (2006). "WS4LR: A Workstation for Lexical Resources". In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, Genoa, Italy, May 2006, pp. 16921697. [ Links ]
[2] Gelbukh, A., Sidorov G. "Approach to construction of automatic morphological analysis systems for inflective languages with little effort". LNCS 2588, 2003, pp. 215220. [ Links ]
[3] Courtois, B., Silberztein, M. (eds.): Dictionnaires électroniques du français. Langue française. 87, Larousse, Paris, 1990. [ Links ]
[4] Krstev C.: Processing of Serbian Automata, Texts and Electronic Dictionarie. Faculty of Philology, University of Belgrade, Belgrade, 2008. [ Links ]
[5] Savary, A., Krstev, C., Vitas, D.: "Inflectional non compositionality and variation of compounds in French, Polish and Serbian, and their automatic processing". Bulag Bulletin de Linguistique Appliquée et Générale. 32, 7394, 2007. [ Links ]
[6] Krstev, C., Vitas, D., Savary, A.: "Prerequisites for a Comprehensive Dictionary of Serbian Compounds". In: Salakosi, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNAI, vol. 4139, pp. 552564. Springer, Heidelberg, 2006. [ Links ]
[7] Krstev, C. Stanković, R., Vitas, D., Obradović, I..: "The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines". In: 6th LREC International Conference on Language Resources and Evaluation, Marrakech, Marocco, 2008. [ Links ]
[8] Krstev C., PavlovićLažetić G., Vitas D., Obradović I.: "Using Textual and Lexical Resources in Developing Serbian Wordnet." In Romanian Journal of Information Science and Technology, Romanian Academy, Publishing House of the Romanian Academy, vol. 7, No. 12, pp. 147161, (2004). [ Links ]
[9] Krstev, C., Vitas, D., Maurel, D., Tran, M. (2005). "Multilingual Ontology of Proper Names". In Proc. of Second Language & Technology Conference, Poznan, Poland, April 2123, Wydawnictwo Poznanskie Sp. z o.o, Poznan. [ Links ]
[10] TMX 1.4b specification, http://www.lisa.org/standards/tmx/tmx.html [ Links ]
NOTE
The presented work was done within the Human Language Technology group, University of Belgrade, Serbia.