SciELO - Scientific Electronic Library Online

 
 número43External Sandhi and its Relevance to Syntactic TreebankingContextual Analysis of Mathematical Expressions for Advanced Mathematical Search índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Artículo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Polibits

versión On-line ISSN 1870-9044

Resumen

VONITSANOU, Maria-Alexandra; KOZANIDIS, Lefteris  y  STAMOU, Sofia. Keywords Identification within Greek URLs. Polibits [online]. 2011, n.43, pp. 75-80. ISSN 1870-9044.

In this paper we propose a method that identifies and extracts keywords within URLs, focusing on the Greek Web and especially on URLs containing Greek terms. Although there are previous works on how to process Greek online content, none of them focuses on keyword identification within URLs of the Greek web domain. In addition, there are many known techniques for web page categorization based on URLs but, none addresses the case of URLs containing transliterated Greek terms. The proposed method integrates two components; a URL tokenizer that segments URL tokens into meaningful words and a Latin-to-Greek script transliteration engine that relies on a dictionary and a set of orthographic and syntactic rules for converting Latin verbalized word tokens into Greek terms. The experimental evaluation of our method against a sample of 1,000 Greek URLs reveals that it can be fruitfully exploited towards automatic keyword identification within Greek URLs.

Palabras llave : Greek to Latin character set transliteration; Greeklish to Greek transliteration; keyword extraction; Uniform Resource Locator; word segmentation.

        · texto en Inglés     · pdf en Inglés