SciELO - Scientific Electronic Library Online

 número43Low Cost Construction of a Multilingual Lexicon from Bilingual ListsClause Boundary Identification using Classifier and Clause Markers in Urdu Language índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados




Links relacionados

  • No hay artículos similaresSimilares en SciELO



versión On-line ISSN 1870-9044

Polibits  no.43 México ene./jun. 2011


A Cross–Lingual Pattern Retrieval Framework


Mei–Hua Chen1*, Chung–Chi Huang1**, Shih–Ting Huang1***, Hsien–Chin Liou2, and Jason S. Chang1****


1 ISA, NTHU, HsinChu, Taiwan, R.O.C. 300 (e–mail: *, **, ***, ****

2 FL, NTHU, HsinChu, Taiwan, R.O.C. 300 (e–mail:


Manuscript received November 28, 2010.
Manuscript accepted for publication January 5, 2011.



We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, part–of–speech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and word–aligning parallel texts and extracting and pruning translation equivalents for cross–lingual ones. At run–time, grammar–like patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstract–to–concrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition.

Key words: Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, cross–lingual pattern retrieval.





[1] M. Benson, "Collocations and idioms," in Robert Ilson (Ed.), Dictionaries, Lexicography and Language Learning, 1985.         [ Links ]

[2] M. Benson, E. Benson and R. Ilson, The BBI Combinatory Dictionary of English. A Guide to Word Combinations, 1986.         [ Links ]

[3] W. Cheng, C. Greaves, and M. Warren, "From n–gram to skipgram to concgram," Corpus Linguistics, 11 (4), 2006.         [ Links ]

[4] Y.C. Chang, J.S. Chang, H.J. Chen, and H.C. Liou, "An automatic collocation writing assistant for Taiwanese EFL learners: a case of corpus–based NLP technology," Computer Assisted Language Learning, 21 (3), 2008.         [ Links ]

[5] P. Durrant, "Investigating the viability of a collocation list for students of English for academic purposes," English for Specific Purposes, 28 (3), 2009.         [ Links ]

[6] S. Feldman, M. Marin, J. Medero, and M. Ostendorf, "Classifying factored genres with part–of–speech histograms," in Proceedings of NAACL, 2009.         [ Links ]

[7] C.J. Fillmore, P. Kay, and M.K. O'Connor, "Regularity and idiomaticity in grammatical constructions: the case of let alone," Language 64, 1988.         [ Links ]

[8] J.R. Firth, "Modes of meaning," Papers in linguistics. Oxford: Oxford University Press, 1957.         [ Links ]

[9] M. Gamon, C. Leacock, C. Brockett, W.B. Dolan. J.F. Gao, D. Belenko, and A. Klementiev, "Using statistical techniques and web search to correct ESL errors," CALICO, 26(3), 2009.         [ Links ]

[10] J.Y. Jian, Y.C. Chang, and J.S. Chang, "TANGO: Bilingual collocational concordance" in Proceedings of ACL, 2004.         [ Links ]

[11] J.H. Johnson, J. Martin, G. Foster, and R. Kuhn, "Improving translation quality by discarding most of the phrasetable," in Proceedings of EMNLP, 2007.         [ Links ]

[12] A. Kilgarriff, P. Rychly, P. Smrz, and D. Tugwell, "The sketch engine," in Proceedings of EURALEX, 2004.         [ Links ]

[13] K. Kita and H. Ogata, "Collations in language learning: corpus–based automatic compilation of collocations and bilingual collocation concordance," in Computer Assisted Language Learning, 10 (3), 1997.         [ Links ]

[14] P. Koehn, F.J. Och, and D. Marcu, "Statistical phrase–based translation," in Proceedings of NAACL/HLT, 2003.         [ Links ]

[15] M. Lewis, "Language in the Lexical Approach," in M. Lewis (Ed.), Teaching Collocation: Further Development in the Lexical Approach, 2000.         [ Links ]

[16] L.E. Liu, A corpus–based lexical semantic investigation of verb–noun miscollocations in Taiwan learners' English, PHD dissertation, 2002.         [ Links ]

[17] I.S.P. Nation, Learning Vocabulary in Another Language. Cambridge: Cambridge Press, 2001.         [ Links ]

[18] N. Nesselhauf, "The use of collocations by advanced learners of English and some implications for teaching," in Applied Linguistics, 24 (3), 2003.         [ Links ]

[19] F. Smadja, "Retrieving collocations from text: Xtract," Computational Linguistics, 19(1), 1993.         [ Links ]

[20] M. Stubbs, "Two quantitative methods of studying phraseology in English," Corpus Linguistics 7(2), 2002.         [ Links ]

[21] M. Stubbs, 2004.–2004.htm.         [ Links ]

[22] D. Wible and N.L. Tsao, "StringNet as a computational resource for discovering and investigating linguistic constructions," in Proceedings of NAACL, 2010.         [ Links ]

[23] D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," in Proceedings of the Annual Meeting of the ACL, 1995.         [ Links ]

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons