versión On-line ISSN 1870-9044
Polibits no.43 México ene./jun. 2011
A CrossLingual Pattern Retrieval Framework
MeiHua Chen1*, ChungChi Huang1**, ShihTing Huang1***, HsienChin Liou2, and Jason S. Chang1****
2 FL, NTHU, HsinChu, Taiwan, R.O.C. 300 (email: email@example.com).
Manuscript received November 28, 2010.
Manuscript accepted for publication January 5, 2011.
We introduce a method for learning to grammatically categorize and organize the contexts of a given query. In our approach, grammatical descriptions, from general word groups to specific lexical phrases, are imposed on the query's contexts aimed at accelerating lexicographers' and language learners' navigation through and GRASP upon the word usages. The method involves lemmatizing, partofspeech tagging and shallowly parsing a general corpus and constructing its inverted files for monolingual queries, and wordaligning parallel texts and extracting and pruning translation equivalents for crosslingual ones. At runtime, grammarlike patterns are generated, organized to form a thesaurus index structure on query words' contexts, and presented to users along with their instantiations. Experimental results show that the extracted predominant patterns resemble phrases in grammar books and that the abstracttoconcrete context hierarchy of querying words effectively assists the process of language learning, especially in sentence translation or composition.
Key words: Grammatical constructions, lexical phrases, context, language learning, inverted files, phrase pairs, crosslingual pattern retrieval.
 M. Benson, "Collocations and idioms," in Robert Ilson (Ed.), Dictionaries, Lexicography and Language Learning, 1985. [ Links ]
 M. Benson, E. Benson and R. Ilson, The BBI Combinatory Dictionary of English. A Guide to Word Combinations, 1986. [ Links ]
 W. Cheng, C. Greaves, and M. Warren, "From ngram to skipgram to concgram," Corpus Linguistics, 11 (4), 2006. [ Links ]
 Y.C. Chang, J.S. Chang, H.J. Chen, and H.C. Liou, "An automatic collocation writing assistant for Taiwanese EFL learners: a case of corpusbased NLP technology," Computer Assisted Language Learning, 21 (3), 2008. [ Links ]
 P. Durrant, "Investigating the viability of a collocation list for students of English for academic purposes," English for Specific Purposes, 28 (3), 2009. [ Links ]
 S. Feldman, M. Marin, J. Medero, and M. Ostendorf, "Classifying factored genres with partofspeech histograms," in Proceedings of NAACL, 2009. [ Links ]
 C.J. Fillmore, P. Kay, and M.K. O'Connor, "Regularity and idiomaticity in grammatical constructions: the case of let alone," Language 64, 1988. [ Links ]
 J.R. Firth, "Modes of meaning," Papers in linguistics. Oxford: Oxford University Press, 1957. [ Links ]
 M. Gamon, C. Leacock, C. Brockett, W.B. Dolan. J.F. Gao, D. Belenko, and A. Klementiev, "Using statistical techniques and web search to correct ESL errors," CALICO, 26(3), 2009. [ Links ]
 J.Y. Jian, Y.C. Chang, and J.S. Chang, "TANGO: Bilingual collocational concordance" in Proceedings of ACL, 2004. [ Links ]
 J.H. Johnson, J. Martin, G. Foster, and R. Kuhn, "Improving translation quality by discarding most of the phrasetable," in Proceedings of EMNLP, 2007. [ Links ]
 A. Kilgarriff, P. Rychly, P. Smrz, and D. Tugwell, "The sketch engine," in Proceedings of EURALEX, 2004. [ Links ]
 K. Kita and H. Ogata, "Collations in language learning: corpusbased automatic compilation of collocations and bilingual collocation concordance," in Computer Assisted Language Learning, 10 (3), 1997. [ Links ]
 P. Koehn, F.J. Och, and D. Marcu, "Statistical phrasebased translation," in Proceedings of NAACL/HLT, 2003. [ Links ]
 M. Lewis, "Language in the Lexical Approach," in M. Lewis (Ed.), Teaching Collocation: Further Development in the Lexical Approach, 2000. [ Links ]
 L.E. Liu, A corpusbased lexical semantic investigation of verbnoun miscollocations in Taiwan learners' English, PHD dissertation, 2002. [ Links ]
 I.S.P. Nation, Learning Vocabulary in Another Language. Cambridge: Cambridge Press, 2001. [ Links ]
 N. Nesselhauf, "The use of collocations by advanced learners of English and some implications for teaching," in Applied Linguistics, 24 (3), 2003. [ Links ]
 F. Smadja, "Retrieving collocations from text: Xtract," Computational Linguistics, 19(1), 1993. [ Links ]
 M. Stubbs, "Two quantitative methods of studying phraseology in English," Corpus Linguistics 7(2), 2002. [ Links ]
 D. Wible and N.L. Tsao, "StringNet as a computational resource for discovering and investigating linguistic constructions," in Proceedings of NAACL, 2010. [ Links ]
 D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," in Proceedings of the Annual Meeting of the ACL, 1995. [ Links ]