How can a compressed file of size 40 kilobyte result in 4.2 petabytes when unzipped?

All compression systems work on a basic principle:-
Don't store what the data is, but rather how to reconstruct it.

Most of the data we generate has repeating patterns, and thus the compression algorithms rely on this fact to ensure the smallest possible size of the compressed data.

Let's say you have the following pattern in your file

"1111111111111111111111111000000000000000000000000000000"

The compression software, while compressing, would store something that conceptually is "25 ones, then 30 zeros" in the compressed file. While unzipping, the software will read this instruction, and write twenty-five ones and thirty zeros in the unzipped file.

So, if someone makes a compressed file which says essentially something like"8000000000000000000 ones" it wouldn't take much space in the compressed file to say that, but when the unzipping software actually tries to write that many ones in the unzipped file while decompressing it, the size of that file will go into the order of exabytes (~1000 Petabytes).

Search Anything from This Blog or Web

Web hosting