SciELO - Scientific Electronic Library Online

vol.13 issue1Using Machine Learning for Extracting Information from Natural Disaster News ReportsPattern Recognition for Micro Workpieces Manufacturing author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand




Related links

  • Have no similar articlesSimilars in SciELO


Computación y Sistemas

Print version ISSN 1405-5546

Comp. y Sist. vol.13 n.1 México Jul./Sep. 2009




Incompressibility and Lossless Data Compression: An Approach by Pattern Discovery


Incompresibilidad y compresión de datos sin pérdidas: Un acercamiento con descubrimiento de patrones


Oscar Herrera Alcántara and Francisco Javier Zaragoza Martínez


Universidad Autónoma Metropolitana Unidad Azcapotzalco Departamento de Sistemas Av. San Pablo No. 180, Col. Reynosa Tamaulipas Del. Azcapotzalco, 02200, Mexico City, Mexico Tel. 53 18 95 32, Fax 53 94 45 34 ,


Article received on July 14, 2008
Accepted on April 03, 2009



We present a novel method for lossless data compression that aims to get a different performance to those proposed in the last decades to tackle the underlying volume of data of the Information and Multimedia Ages. These latter methods are called entropic or classic because they are based on the Classic Information Theory of Claude E. Shannon and include Huffman [8], Arithmetic [14], Lempel–Ziv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] and Prediction by Partial Matching (PPM) [5] techniques. We review the Incompressibility Theorem and its relation with classic methods and our method based on discovering symbol patterns called metasymbols. Experimental results allow us to propose metasymbolic compression as a tool for multimedia compression, sequence analysis and unsupervised clustering.

Keywords: Incompressibility, Data Compression, Information Theory, Pattern Discovery, Clustering.



Presentamos un método novedoso para compresión de datos sin pérdidas que tiene por objetivo principal lograr un desempeño distinto a los propuestos en las últimas décadas para tratar con los volúmenes de datos propios de la Era de la Información y la Era Multimedia. Esos métodos llamados entrópicos o clásicos están basados en la Teoría de la Información Clásica de Claude E. Shannon e incluye los métodos de codificación de Huffman [8], Aritmético [14], Lempel–Ziv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] y Prediction by Partial Matching (PPM) [5]. Revisamos el Teorema de Incompresibilidad y su relación con los métodos clásicos y con nuestro compresor basado en el descubrimiento de patrones llamados metasímbolos. Los resultados experimentales nos permiten proponer la compresión metasimbólica como una herramienta de compresión de archivos multimedios, útil en el análisis y el agrupamiento no supervisado de secuencias.

Palabras clave: Incompresibilidad, Compresión de Datos, Teoría de la Información, Descubrimiento de Patrones, Agrupamiento.





1. Barnsley, M. (1993) "Fractals Everywhere", Morgan Kaufmann Pub; 2nd. Sub edition.        [ Links ]

2. Barnsley, M. and Hurd, L. (1992) "Fractal Image Compression", AK Petters, Ltd., Wellesley, Ma.        [ Links ]

3. Bentley, J., Sleator, et. al. (1986) "A locally adaptive data compression algorithm", Communications of the ACM, Vol. 29, No. 4, pp 320–330.        [ Links ]

4. Burrows, M. and Wheeler, D. (1994) "A block–sorting lossless data compression algorithm", Digital Syst. Res. Ctr., Palo Alto, CA, Tech. Rep. SRC 124.        [ Links ]

5. Cleary, J. and Witten, I. (1984) "Data compression using adaptive coding and partial string matching", IEEE Transactions on Communications, Vol. 32, No. 4, pp 396–402.        [ Links ]

6. Feller, W. (1968) "An Introduction to Probability Theory and Its Applications", John Wiley, 2nd. Edition, pp 233–234.        [ Links ]

7. Hacker, S. (2000) "MP3: The definitive guide, Sebastopol Calif., O'Reilly.        [ Links ]

8. Huffman, D. (1952) "A method for the construction of minimum–redundancy codes", Proc. Inst. Radio Eng. 40, 9, pp 1098–1101.        [ Links ]

9. Kuri, A. and Galaviz, J. (2004) "Pattern–based data compression", Lecture Notes in Artificial Intelligence LNAI 2972, pp 1–10.        [ Links ]

10. Nelson, M. and Gailly, J. (1995) The "Data Compression Book", Second Edition, MT Books Redwood City, CA.        [ Links ]

11. Nevill–Manning, C. G. and Witten, I. (1999) "Protein is incompressible", Data Compression Conference (DCC '99). p. 257.        [ Links ]

12. Pennebaker, W.B. and Mitchell, J. (1993) "JPEG: Still Image Data Compression Standard", ITP Inc.        [ Links ]

13. Ross, A. and Tim, B. (1997) "A Corpus for the Evaluation of Lossless Compression Algorithms", Proceedings of the Conference on Data Compression, IEEE Computer Society, Washington, DC, USA.        [ Links ]

14. Witten, I., Neal, R. and Cleary J. (1987) "Arithmetic coding for data compression", Communications of the ACM 30(6) pp 520–540.        [ Links ]

15. Ziv, J., and Lempel, A. (1977) "A Universal Algorithm for Sequential Data Compression", IEEE Trans. on Inf. Theory IT–23, v3, pp 337–343.        [ Links ]

16. Microsoft Encarta, fractal image compression (2008) (accessed July 14, 2008).        [ Links ]

17. Calgary Corpus (2008) (accessed July 14, 2008).        [ Links ]

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License