Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Computación y Sistemas
versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546
Comp. y Sist. vol.13 no.1 Ciudad de México jul./sep. 2009
Artículos
Incompressibility and Lossless Data Compression: An Approach by Pattern Discovery
Incompresibilidad y compresión de datos sin pérdidas: Un acercamiento con descubrimiento de patrones
Oscar Herrera Alcántara and Francisco Javier Zaragoza Martínez
Universidad Autónoma Metropolitana Unidad Azcapotzalco Departamento de Sistemas Av. San Pablo No. 180, Col. Reynosa Tamaulipas Del. Azcapotzalco, 02200, Mexico City, Mexico Tel. 53 18 95 32, Fax 53 94 45 34 oha@correo.azc.uam.mx , franz@correo.azc.uam.mx
Article received on July 14, 2008
Accepted on April 03, 2009
Abstract
We present a novel method for lossless data compression that aims to get a different performance to those proposed in the last decades to tackle the underlying volume of data of the Information and Multimedia Ages. These latter methods are called entropic or classic because they are based on the Classic Information Theory of Claude E. Shannon and include Huffman [8], Arithmetic [14], LempelZiv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] and Prediction by Partial Matching (PPM) [5] techniques. We review the Incompressibility Theorem and its relation with classic methods and our method based on discovering symbol patterns called metasymbols. Experimental results allow us to propose metasymbolic compression as a tool for multimedia compression, sequence analysis and unsupervised clustering.
Keywords: Incompressibility, Data Compression, Information Theory, Pattern Discovery, Clustering.
Resumen
Presentamos un método novedoso para compresión de datos sin pérdidas que tiene por objetivo principal lograr un desempeño distinto a los propuestos en las últimas décadas para tratar con los volúmenes de datos propios de la Era de la Información y la Era Multimedia. Esos métodos llamados entrópicos o clásicos están basados en la Teoría de la Información Clásica de Claude E. Shannon e incluye los métodos de codificación de Huffman [8], Aritmético [14], LempelZiv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] y Prediction by Partial Matching (PPM) [5]. Revisamos el Teorema de Incompresibilidad y su relación con los métodos clásicos y con nuestro compresor basado en el descubrimiento de patrones llamados metasímbolos. Los resultados experimentales nos permiten proponer la compresión metasimbólica como una herramienta de compresión de archivos multimedios, útil en el análisis y el agrupamiento no supervisado de secuencias.
Palabras clave: Incompresibilidad, Compresión de Datos, Teoría de la Información, Descubrimiento de Patrones, Agrupamiento.
DESCARGAR ARTÍCULO EN FORMATO PDF
References
1. Barnsley, M. (1993) "Fractals Everywhere", Morgan Kaufmann Pub; 2nd. Sub edition. [ Links ]
2. Barnsley, M. and Hurd, L. (1992) "Fractal Image Compression", AK Petters, Ltd., Wellesley, Ma. [ Links ]
3. Bentley, J., Sleator, et. al. (1986) "A locally adaptive data compression algorithm", Communications of the ACM, Vol. 29, No. 4, pp 320330. [ Links ]
4. Burrows, M. and Wheeler, D. (1994) "A blocksorting lossless data compression algorithm", Digital Syst. Res. Ctr., Palo Alto, CA, Tech. Rep. SRC 124. [ Links ]
5. Cleary, J. and Witten, I. (1984) "Data compression using adaptive coding and partial string matching", IEEE Transactions on Communications, Vol. 32, No. 4, pp 396402. [ Links ]
6. Feller, W. (1968) "An Introduction to Probability Theory and Its Applications", John Wiley, 2nd. Edition, pp 233234. [ Links ]
7. Hacker, S. (2000) "MP3: The definitive guide, Sebastopol Calif., O'Reilly. [ Links ]
8. Huffman, D. (1952) "A method for the construction of minimumredundancy codes", Proc. Inst. Radio Eng. 40, 9, pp 10981101. [ Links ]
9. Kuri, A. and Galaviz, J. (2004) "Patternbased data compression", Lecture Notes in Artificial Intelligence LNAI 2972, pp 110. [ Links ]
10. Nelson, M. and Gailly, J. (1995) The "Data Compression Book", Second Edition, MT Books Redwood City, CA. [ Links ]
11. NevillManning, C. G. and Witten, I. (1999) "Protein is incompressible", Data Compression Conference (DCC '99). p. 257. [ Links ]
12. Pennebaker, W.B. and Mitchell, J. (1993) "JPEG: Still Image Data Compression Standard", ITP Inc. [ Links ]
13. Ross, A. and Tim, B. (1997) "A Corpus for the Evaluation of Lossless Compression Algorithms", Proceedings of the Conference on Data Compression, IEEE Computer Society, Washington, DC, USA. [ Links ]
14. Witten, I., Neal, R. and Cleary J. (1987) "Arithmetic coding for data compression", Communications of the ACM 30(6) pp 520540. [ Links ]
15. Ziv, J., and Lempel, A. (1977) "A Universal Algorithm for Sequential Data Compression", IEEE Trans. on Inf. Theory IT23, v3, pp 337343. [ Links ]
16. Microsoft Encarta, fractal image compression (2008) http://encarta.msn.com/encyclopedia_761568021/fractal.html (accessed July 14, 2008). [ Links ]
17. Calgary Corpus (2008) http://links.uwaterloo.ca/calgary.corpus.html (accessed July 14, 2008). [ Links ]