SciELO - Scientific Electronic Library Online

 
 número46Map Building of Unknown Environment Using L1-norm, Point-to-Point Metric and Evolutionary ComputationLexical Disambiguation of Arabic Language: An Experimental Study índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Polibits

versão On-line ISSN 1870-9044

Polibits  no.46 México Jul./Dez. 2012

 

Redes de palabras alineadas como recurso en la extracción de equivalencias léxicas de traducción y su aplicación en la alineación

 

Aligned Word Networks as a Resource for Extraction of Lexical Translation Equivalents, and their Application to the Text Alignment Task

 

Eduardo Cendejas, Grettel Barceló, Gigori Sidorov, Alexander Gelbukh, and Liliana Chanona–Hernandez

 

Center for Computing Research, National Polytechnic Institute, Mexico City, Mexico. (web: cic.ipn.mx/sidorov, www.gelbukh.com).

 

Manuscript received December 9, 2011.
Manuscript accepted for publication March 8, 2012.

 

Resumen

La equivalencia léxica de traducción se define mediante correspondencias establecidas entre dos lenguas, comúnmente denominadas lengua de origen y lengua meta. Este artículo propone un método de extracción de dichas equivalencias en palabras no funcionales. El algoritmo se basa en dos recursos principales: 1) MultiWordNet como léxico especializado para cada uno de los idiomas involucrados y 2) textos paralelos como información adicional para proporcionar diversas lexicalizaciones de las palabras a corresponder. Utiliza como fundamento principal el hecho de que las redes de palabras que conforman MultiWordNet estan alineadas. Adeirms, se presenta la reutilización del repositorio de pares léxicos obtenidos, señalando la forma en que esta información es susceptible de ser usada en un sistema de alineación a nivel de palabras. Para realizar los experimentos se emplearon textos paralelos bilingües sin notación morfosintáctica alguna, alineados a nivel de oración en los pares de idiomas español / inglés y español / italiano.

Palabras clave: Equivalencias léxicas, alineación, redes de palabras, textos paralelos.

 

Abstract

The notion of lexical translation equivalent is defined via correspondence established between two languages conventionally called source and target languages. We propose a method for extraction such equivalents for non–functional words. Our algorithm is based in two main resources: (1) MultiWordNet as a specialized lexicon for each one of the two languages in question and (2) a parallel text corpus as a source of additional information that provides various lexicalizations of the words that are being aligned. Our method is based on the fact that the word networks that form MultiWordNet are aligned. In addition, we discuss an application of the obtained list of word pairs in a word–level text alignment system. In our experiments we used bilingual sentence–level aligned parallel texts, without any morphosyntactic annotation, for word pairs Spanish / English and Spanish / Italian.

Key words: Translation equivalents, alignment, word networks, parallel texts.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCIAS

[1] E. Macklovitch and M. Hannan, "Line 'em up: Advances in alignment technology and their impact on translation support tools," Machine Translation, vol. 13, no. 1, pp. 41–57, 1998.         [ Links ]

[2] C. . Nevill and T. Bell, "Compression of parallel texts," Information Processing and Management: an International Journal, vol. 28, no. 6, pp. 781–793, 1992.         [ Links ]

[3] J. Vera and G. Sidorov, "Proyecto de preparación del corpus paralelo alineado español–ingles," in Memorias del Encuentro Internacional de la Ciencias de la Computación, México, 2004.         [ Links ]

[4] D. Tufiş, A. Barbu, and R. Ion, "Treq–al: a word alignment system with limited language resources," in Proceedings of the HLT–NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond, Canada, 2003, pp. 36—39.         [ Links ]

[5] R. Mihalcea and T. Pedersen, "An evaluation exercise for word alignment," in Proceedings of the HLT–NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, Canada, 2003, pp. 1–10.         [ Links ]

[6] C. Kit, J. Webster, H. P. K. Sin, and H. Li, "Clause alignment for bilingual hong kong legal texts with available lexical resources," in Proceedings of the 20th International Conference on Computer Processing of Oriental Languages, China, 2003, pp. 286—292.         [ Links ]

[7] W. Gale and K. Church, "Program for aligning sentences in bilingual corpora," Computational Linguistics, vol. 19, no. 1, pp. 75—102, 1993.         [ Links ]

[8] P. Brown, J. Lai, and R. Mercer, "Aligning sentences in parallel corpora," in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, EUA, 1991.         [ Links ]

[9] D. Wu, "Aligning a parallel english–chinese corpus statistically with lexical criteria," in Proceedings of the 32nd annual meeting on Association for Computational Linguistics, EUA, 1994, pp. 80—87.         [ Links ]

[10] A. Gelbukh, G. Sidorov, and J. Vera, "A bilingual corpus of novels aligned at paragraph level," in Proceedings of the 5th International Conference on NLP, Finlandia, 2006, pp. 16–23.         [ Links ]

[11] M. Kay and M. Roscheisen, "Text–translation alignment," Computational Linguistics, vol. 19, no. 1, pp. 121–142, 1993.         [ Links ]

[12] S. Chen, "Aligning sentences in bilingual corpora using lexical information," in Proceedings of the 31st annual meeting on Association for Computational Linguistics, EUA, 1993, pp. 9—16.         [ Links ]

[13] M. Haruno and T. Yamazaki, "High–performance bilingual text alignment using statistical and dictionary information," in Proceedings of the Annual Conference of the Association for Computational Linguistics, EUA, 1996, pp. 131—138.         [ Links ]

[14] M. Mikhailov, "Parallel corpus aligning: Illusions and perspectives," The Austrian Academy Corpus, 2002.         [ Links ]

[15] T. Tanaka and Y. Matsuo, "Extraction of translation equivalents from non–parallel corpora," in Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation, England, 1999, pp. 109—119.         [ Links ]

[16] F. Smadja, V. Hatzivassiloglou, and K. McKeown, "Translating collocations for bilingual lexicons: A statistical approach," Computational Linguistics, vol. 22, no. 1, pp. 1—38, 1996.         [ Links ]

[17] M. Simard, G. Foster, and P. Isabelle, "Using cognates to align sentences in parallel corpora," in Proceedings of the 4th International Conference on Theoretical and Methodological Issues in Machine Translation, Canada, 1992, pp. 67—81.         [ Links ]

[18] F. Debili and E. Sammouda, "Appariement des phrases de textes bilingues français–anglais et français–arabe," in Proceedings of the 14th Conference on Computational Linguistics, Francia, 1992.         [ Links ]

[19] P. Brown, S. Della, V. Della, and R. Mercer, "The mathematics of statistical machine translation: Parameter estimation," Computational Linguistics, vol. 19, no. 2, pp. 263—311, 1993.         [ Links ]

[20] S. Vogel, H. . Ney, and C. Tillmann, "Hmm–based word alignment in statistical translation," in Proceedings of the 16th International Conference on Computational Linguistics, Denmark, 1996, pp. 836—841.         [ Links ]

[21] K. Sato and H. Saito, "Extracting word sequence correspondences with support vector machines," in Proceedings of the 19th international conference on Computational linguistics, Taiwan, 2002, pp. 1—7.         [ Links ]

[22] D. Tufis, "A cheap and fast way to build useful translation lexicons," in Proceedings of the 19th International Conference on Computational Linguistics, Taiwan, 2002, pp. 1030—1036.         [ Links ]

[23] S. Ker and J. Chang, "A class–based approach to word alignment," Computational Linguistics, vol. 23, no. 2, pp. 313—343, 1997.         [ Links ]

[24] D. Melamed, "Models of translational equivalence among words," Computational Linguistics, vol. 26, no. 2, pp. 221—249, 2000.         [ Links ]

[25] D. Hiemstra, "Deriving a bilingual lexicon for cross language information retrieval," in Proceedings of Gronics, Netherlands, 1997, pp. 21—26.         [ Links ]

[26] J. Kupiec, "An algorithm for finding noun phrase correspondences in bilingual corpora," in Proceedings of the 31st annual meeting on Association for Computational Linguistics, EUA, 1993, pp. 17—22.         [ Links ]

[27] "Eurowordnet," http://www.illc.uva.nl/EuroWordNet/, 2001, consultado 29/12/08.         [ Links ]

[28] "Multiwordnet," http://multiwordnet.itc.it/english/home.php, 2004, consultado 29/12/08.         [ Links ]

[29] E. Pianta, L. Bentivogli, and C. Girardi, "Multiwordnet: developing an aligned multilingual database," in Proceedings of the First International Conference on Global WordNet, India, 2002, pp. 21—25.         [ Links ]

[30] C. Leacock and M. Chodorow, "Combining local context andwordnet similarity for word sense identification," WordNet: An electronic Lexical Database, pp. 265—283, 1998.         [ Links ]

[31] G. Hirst and D. St–Onge, "Lexical chains as representations of context for the detection and correction of malapropisms," WordNet: An electronic Lexical Database, pp. 305—332, 1998.         [ Links ]

[32] R. Rada, H. Mili, E. Bicknell, and M. Bletner, "Development and application of a metric on semantic nets," IEEE Transactions on Systems, Man, and Cybernetics, vol. 19, no. 1, pp. 17—30, 1989.         [ Links ]

[33] T. Pedersen, S. Patwardhan, and J. Michelizzi, "Wordnet::similarity –measuring the relatedness of concepts," in Proceedings of the 19th National Conference on Artificial Intelligence, EUA, 2004, pp. 144—152.         [ Links ]

[34] S. M. H.H. Do and E. Rahm, "Comparison of schema matching evaluations," in Proceedings of the GI–Workshop Web and Databases, Erfurt, 2002, pp. 221—237.         [ Links ]

Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons