versión On-line ISSN 1870-9044
Polibits no.41 México ene./jun. 2010
Special section: processing of semantic information
Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling
Mikhail Mikhailov, Hannu Tommola, and Nina Isolahti
Manuscript received March 7, 2010.
Manuscript accepted for publication May 31, 2010.
The need for corpora of interpreting discourse in translation studies is gradually increasing. The research of AV translation is another rapidly developing sphere, thus corpora of subtitling and dubbing would also be quite useful. The main reason of the lack in such resources is the difficulty of obtaining data and the inevitability of manual data input. An interpreting corpus would be a collection of transcripts of speech in two or more languages with part of the transcripts aligned. The subtitling and dubbing corpora can be designed using the same principles. The structure of the corpus should reflect the polyphonic nature of the data. Thus, markup becomes extremely important in these types of corpora. The research presented in this paper deals with corpora of FinnishRussian interpreting discourse and subtitling. The software package developed for processing of the corpora includes routines specially written for studying speech transcripts rather than written text. For example, speaker statistics function calculates number of words, number of pauses, their duration, average speech tempo of a certain speaker.
Key words: Interpreting, subtitling, corpora, Russian language, Finnish language.
 J. Pomikálek, P. Rychlý and A. Kilgarriff, "Scaling to Billionplus Word Corpora, " in Advances in Computational Linguistics. Special Issue of Research in Computing Science, Vol 41, Mexico City. 2009. Available: http://www.kilgarriff.co.uk/Publications/2009PomikalekRychlyKilgMexJournalScalingUp.pdf [ Links ]
 E. G. Devine, S. A. Gaehde, and A. C. Curtis, "Technology Evaluation: Comparative Evaluation of Three Continuous Speech Recognition Software" in Packages in the Generation of Medical Reports JAMIA 2000, pp. 462468. [ Links ]
 E. Grišina, "Ustnaja reč v Nacional'nom korpuse russkogo jazyka," Nacional'nyj korpus russkogo jazyka: 20032005. M.: Indrik, 2005. [ Links ]
 Y. Gambier, "Challenges in research on audiovisual translation," in Translation research projects, Tarragona, 2009, pp. 1727. [ Links ]
 R. González, V. F. Vásquez, H. Mikkelson, Fundamentals of Court Interpretation. Theory, Policy, and Practice, Durham, North Carolina: Carolina Academic Press, 1991. [ Links ]
 S. Hale, The Discourse of Court Interpreting. Discourse practices of the law, the witness and the interpreter, Amsterdam Philadelphia: John Benjamins, 2004. [ Links ]
 T. R. Välikoski, The Criminal Trial as a Speech Communication Situation, Tampere: Tampere University Press, 2004 [ Links ]
 A. Rosa, "Features of Oral and Written Communication in Subtitling," Multimedia Translation, Y. Gambier and H. Gottlieb (eds.), John Benjamins, Amsterdam/Philadelphia, 2001. [ Links ]
 J. Heulwen, "Quality Control of Subtitles: Review or Preview," Multimedia Translation. Y. Gambier and H. Gottlieb (eds.), John Benjamins, Amsterdam/Philadelphia, 2001. [ Links ]
 J. Pedersen. "Scandinavian Subtitles: A comparative study of subtitling norms in Sweden and Denmark with focus on extralinguistic cultural references," Ph.D. dissertation. Stockholm: University of Stockholm. 2007. [ Links ]
 M. Mikhailov and N. Isolahti, "Korpus ustnyx perevodov kak novyj tip korpusa tekstov (The corpus of interpreting as a new type of text corpora, in Russian)," in Dialog2008 International Conference, June 4th8th, Moscow, 2008, http://www.dialog21.ru/dialog2008/materials/html/58.htm. [ Links ]
 W Lezius, "Morphy German Morphology, PartofSpeech Tagging and Applications," in Proceedings of the 9th EURALEX International Congress Stuttgart, Germany, 2000, pp. 619623. Available: http://www.wolfganglezius.de/lib/exe/fetch.php?media=cl:euralex2000.pdf [ Links ]
 A. Gelbukh and G. Sidorov, "Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Computational Linguistics and Intelligent Text," Lecture Notes in Computer Science, N 2588, SpringerVerlag, 2003, pp. 215220. [ Links ]