SciELO - Scientific Electronic Library Online

 issue41EditorialSemi-Automatic Parallel Corpora Extraction from Comparable News Corpora author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO



On-line version ISSN 1870-9044

Polibits  n.41 México Jan./Jun. 2010


Special section: processing of semantic information


Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling


Mikhail Mikhailov, Hannu Tommola, and Nina Isolahti


School of Modern Languages and Translation Studies of the University of Tampere, Finland (e–mail:,,


Manuscript received March 7, 2010.
Manuscript accepted for publication May 31, 2010.



The need for corpora of interpreting discourse in translation studies is gradually increasing. The research of AV translation is another rapidly developing sphere, thus corpora of subtitling and dubbing would also be quite useful. The main reason of the lack in such resources is the difficulty of obtaining data and the inevitability of manual data input. An interpreting corpus would be a collection of transcripts of speech in two or more languages with part of the transcripts aligned. The subtitling and dubbing corpora can be designed using the same principles. The structure of the corpus should reflect the polyphonic nature of the data. Thus, markup becomes extremely important in these types of corpora. The research presented in this paper deals with corpora of Finnish–Russian interpreting discourse and subtitling. The software package developed for processing of the corpora includes routines specially written for studying speech transcripts rather than written text. For example, speaker statistics function calculates number of words, number of pauses, their duration, average speech tempo of a certain speaker.

Key words: Interpreting, subtitling, corpora, Russian language, Finnish language.





[1] J. Pomikálek, P. Rychlý and A. Kilgarriff, "Scaling to Billion–plus Word Corpora, " in Advances in Computational Linguistics. Special Issue of Research in Computing Science, Vol 41, Mexico City. 2009. Available:–PomikalekRychlyKilg–MexJournal–ScalingUp.pdf        [ Links ]

[2] E. G. Devine, S. A. Gaehde, and A. C. Curtis, "Technology Evaluation: Comparative Evaluation of Three Continuous Speech Recognition Software" in Packages in the Generation of Medical Reports JAMIA 2000, pp. 462–468.         [ Links ]

[3] E. Grišina, "Ustnaja reč v Nacional'nom korpuse russkogo jazyka," Nacional'nyj korpus russkogo jazyka: 2003–2005. M.: Indrik, 2005.         [ Links ]

[4] Y. Gambier, "Challenges in research on audiovisual translation," in Translation research projects, Tarragona, 2009, pp. 17—27.         [ Links ]

[5] J. Tiedemann, "Improved Sentence Alignment for Movie Subtitles," in Proceedings of RANLP '07, Borovets, Bulgaria, 2007.         [ Links ]

[6] R. González, V. F. Vásquez, H. Mikkelson, Fundamentals of Court Interpretation. Theory, Policy, and Practice, Durham, North Carolina: Carolina Academic Press, 1991.         [ Links ]

[7] S. Hale, The Discourse of Court Interpreting. Discourse practices of the law, the witness and the interpreter, Amsterdam Philadelphia: John Benjamins, 2004.         [ Links ]

[8] T. R. Välikoski, The Criminal Trial as a Speech Communication Situation, Tampere: Tampere University Press, 2004         [ Links ]

[9] A. Rosa, "Features of Oral and Written Communication in Subtitling," Multimedia Translation, Y. Gambier and H. Gottlieb (eds.), John Benjamins, Amsterdam/Philadelphia, 2001.         [ Links ]

[10] J. Heulwen, "Quality Control of Subtitles: Review or Preview," Multimedia Translation. Y. Gambier and H. Gottlieb (eds.), John Benjamins, Amsterdam/Philadelphia, 2001.         [ Links ]

[11] J. Pedersen. "Scandinavian Subtitles: A comparative study of subtitling norms in Sweden and Denmark with focus on extralinguistic cultural references," Ph.D. dissertation. Stockholm: University of Stockholm. 2007.         [ Links ]

[12] M. Mikhailov and N. Isolahti, "Korpus ustnyx perevodov kak novyj tip korpusa tekstov (The corpus of interpreting as a new type of text corpora, in Russian)," in Dialog–2008 International Conference, June 4th–8th, Moscow, 2008, http://www.dialog–         [ Links ]

[13] W Lezius, "Morphy – German Morphology, Part–of–Speech Tagging and Applications," in Proceedings of the 9th EURALEX International Congress Stuttgart, Germany, 2000, pp. 619–623. Available:        [ Links ]

[14] A. Gelbukh and G. Sidorov, "Approach to construction of automatic morphological analysis systems for inflective languages with little effort. In: Computational Linguistics and Intelligent Text," Lecture Notes in Computer Science, N 2588, Springer–Verlag, 2003, pp. 215–220.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License