SciELO - Scientific Electronic Library Online

 
 issue45String Distances for Near-duplicate DetectionSpatial Reasoning for Determining the Domain of the Set of Tags that Represent Geographic Objects author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Polibits

On-line version ISSN 1870-9044

Polibits  n.45 México Jun. 2012

 

Comparing Sanskrit Texts for Critical Editions: The Sequences Move Problem

 

Nicolas Béchet1 and Marc Csernel2

 

1 Nicolas Béchet is with GREYC Université de Caen Basse-Normandie, France (e-mail: nicolas.bechet@unicaen.fr)

2 Marc Csernel is with INRIA Roquencourt, Université Paris Dauphine, France (e-mail: Marc.Csernel@inria.fr)

 

Manuscript received on October 20, 2011.
accepted for publication on December 9, 2011.

 

Abstract

A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics which make Sanskrit text comparisons a specific matter. It presents two different methods for comparing Sanskrit texts, which can be used to develop a computer assisted critical edition. The first one method uses the L.C.S., while the second one uses the global alignment algorithm. Comparing them, we see that the second method provides better results, but that neither of these methods can detect when a word or a sentence fragment has been moved. We then present a method based on N-gram that can detect such a movement when it is not too far from its original location. We show how the method behaves on several examples.

Key words: Sanskrit, text alignment.

 

DESCARGAR ARTÍCULO EN FORMATO PDF

 

REFERENCES

[1] P. O'Hara, R.J. Robinson, "Computer-assisted methods of stemmatic analysis," in Occasional Papers of the Canterbury Tales Project, N. Blake and P. Robinson, Eds. Oxford University: Office for Humanities Communication, 1993, vol. 1, pp. 53-74.         [ Links ]

[2] C. Monroy et al, "Visualization of variants in textual collations to analyse the evolution of literary works in the Cervantes project," in Proceedings of the 6th European Conference, ECDL 2002, M. Agosti and e. Constantino Thanos, Eds. Rome, Italy: Springer, September 2002, pp. 638-53.         [ Links ]

[3] M. Csernel and F. Patte, "Critical edition of Sanskrit texts," in Sanskrit Computational Linguistics, ser. Lecture Notes in Computer Science, vol. 5402, 2009, pp. 358-379.         [ Links ]

[4] M. Csernel and T. Cazenave, "Comparing Sanskrit texts for critical editions," in COLING, Beijing, 2010, pp. 206-213.         [ Links ]

[5] F. Velthuis, Devanagari for TEX, Version 1.2, User Manual, University of Groningen, 1991, http://www.ctan.org/tex-archive/language/devanagari/velthuis/        [ Links ]

[6] U. Consortium, "Unicode standard version 6.0: Devanagari," http://unicode.org/charts/PDF/U0900.pdf, Inria, 2010.         [ Links ]

[7] G. Huet, "Heritage du Sanskrit: Dictionnaire fran§ais-sanskrit," http://sanskrit.inria.fr/Dico.pd, Inria, 2006.         [ Links ]

[8] ----------, "Design of a lexical database for Sanskrit," in COLING Workshop on Electronic Dictionaries, Geneva, 2004, pp. 8-14.         [ Links ]

[9] R. L. Solso, "Bigram and trigram frequencies and versatilities in the english language," In Behavior Research Methods & Instrumentation, vol. 11, no. 5, pp. 475-484, 1979.         [ Links ]

[10] H. Lei and N. Mirghafori, "Word-conditioned phone N-grams for speaker recognition," in Proc. of ICASSP, Honolulu, 2007.         [ Links ]

[11] M. Brudno and al, "Glocal alignment: finding rearrangements during alignment," in ISMB (Supplement of Bioinformatics), 2003, pp. 54-62.         [ Links ]

[12] M. Le Pouliquen, "Using lattices for reconstructing stemma," in Fifth International Conference on Concept Lattices and Their Applications, CLA., 2007.         [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License