|
Data compression is a way of re-encoding data to hold more information
in less space than when uncompressed.
Data often contains repeated structures that can be represented by simpler
or smaller unit.
The size of the data is reduced by the amount of the reduction in size
and the number of times the unit can be used.
Consider the following:
the quick brown fox jumps over the lazy dog
The word 'the' appears more than once so if we used #1 to represent the
word 'the', we would have
#1 quick brown fox jumps over #1 lazy dog
-a saving of only 2 characters out of 43 or about 4.6%.
So far, so good - but if we add more rules,
she sells sea shells on the sea shore
might become
she s#2 #3 sh#2 on #1 #3 shore
This gives a saving of 7 characters out of 37 or about 19%
Looking at the above it's fairly obvious that the amount of compression
depends greatly on the data used.
Most compression algorithms are far more complex than this shown and
can sometimes achieve phenomenal compression rates - particularly if the
data is designed to show off the compression. Consider how well a file
containing mostly the letter "L" would be compressed by a utility
optimised to compress Ls. The figures can look very good indeed.
Since in the real world a file of just Ls will rarely happen, the stated
compression ratio would be 'just for show'.
Different types of data have their own distinct characteristics and many
of these characteristics are used when designing a compression routine.
This means that a text file may be compressed very well by one routine,
but when passed though something designed to compress some other file type,
may even grow a little.
Be sure to use real-world examples when comparing compression figures.
Why
use compression ?
How do I use it ?
|