High-performance file compression software
NanoZip is an experimental file archiver software that incorporates several original file compression algorithms. It is designed for high data compression efficiency and includes many experimental features, such as fine-grained parallel compression algorithms.
The latest version of NanoZip (2011) is available for 32-bit and 64-bit Windows and Linux systems. The Windows version includes both a graphical user interface and a command-line interface, while the Linux versions are command-line only.
NanoZip offers excellent compression performance across a wide range of file types. Below are some performance comparisons for different types of data:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cc | 73 | 724 | 722 |
nz 0.09 -cO | 83 | 164 | 37 |
nz 0.09 -co | 91 | 67 | 10 |
uharc 0.6b -mx | 96 | 448 | 363 |
7-zip 9.12 -mx | 99 | 187 | 9 |
nz 0.09 -cD | 117 | 23 | 2 |
rar 4.2 -m5 | 117 | 41 | 4 |
nz 0.09 -cd | 131 | 6.5 | 1.9 |
gzip 1.3.3 -9 | 159 | 84 | 5 |
nz 0.09 -cf | 169 | 1.9 | 2 |
gzip 1.3.3 -3 | 170 | 15 | 5 |
The '-cO' algorithm in NanoZip is the strongest known asymmetric compression algorithm. No other algorithm decompresses faster at this compression ratio. See thorough compression comparison with other file compressors.
NanoZip has special algorithms for compressing audio data. Below is a comparison of NanoZip with other popular audio compressors:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cd | 123 | 9.3 | 7.2 |
flac 1.2.1 -8 | 124 | 28 | 3.4 |
nz 0.09 -cf | 128 | 3.1 | 3.1 |
flac 1.2.1 -1 | 134 | 4.1 | 3.2 |
The above results are with NanoZip multithreading disabled.
Much of the original work in NanoZip is built around text compression. Below is a comparison of NanoZip with other text compressors:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cc | 19.6 | 112 | 111 |
nz 0.09 -cO | 21.0 | 8.3 | 5.3 |
nz 0.09 -co | 21.7 | 4.5 | 2.5 |
7-zip 9.12 -mx | 27.5 | 117 | 2.2 |
bzip2 1.0.5 | 28.2 | 11.8 | 5.3 |
nz 0.09 -cD | 28.5 | 2.9 | 0.5 |
rar 4.2 -m5 | 31.1 | 29.7 | 0.9 |
gzip 1.3.3 -9 | 37.9 | 12.7 | 1.16 |
NanoZip understands chess game notation (1. e4 e5...) and outperforms on such data. Below is a comparison of NanoZip with other compressors for a 100 MB chess file:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cO | 11.8 | 10.7 | 3.1 |
nz 0.09 -co | 12.7 | 5.7 | 1.9 |
7-zip 9.12 -mx | 19.5 | 94.2 | 1.8 |
bzip2 1.0.5 | 19.9 | 14.8 | 4.5 |
rar 4.2 -m5 | 23.9 | 30.8 | 0.7 |
nz 0.09 -cd | 24.0 | 1.1 | 0.5 |
gzip 1.3.3 -9 | 29.8 | 13.3 | 1.0 |
No illegal moves are checked nor is there an integrated chess engine, hence the results could be improved.
NanoZip algorithms handle linear sequences (e.g. 'abcdef...', 'x0u1a2p3...') embedded in heterogeneous data. Below is a comparison for a 700 MB example:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cc | 126 | 1213 | 1174 |
nz 0.09 -cO | 128 | 507 | 120 |
uharc 0.6b -mx | 161 | 698 | 572 |
nz 0.09 -co | 162 | 117 | 18 |
7-zip 9.12 -mx | 162 | 154 | 16 |
nz 0.09 -cDP | 165 | 52 | 5 |
nz 0.09 -cD | 188 | 21 | 2.9 |
rar 4.2 -m5 | 198 | 51.5 | 6.7 |
nz 0.09 -cd | 213 | 10.5 | 2.6 |
bzip2 1.0.5 | 240 | 91 | 36 |
gzip 1.3.3 -9 | 247 | 162 | 7.3 |
nz 0.09 -cf | 255 | 2.88 | 2.77 |
lzop 1.03 -1 | 343 | 6.27 | 2.16 |
NanoZip's algorithms are memory-efficient and recognize similarities between data blocks, even if they are far apart. This effect is amplified in the parallel compression algorithms. Below is an example of compressing 800 MB of compiler binaries using 500 MB of memory:
Compressor | Compressed Size (MB) | Compression Time (s) | Decompression Time (s) |
---|---|---|---|
nz 0.09 -cO | 43 | 351 | 32 |
nz 0.09 -cdP | 65 | 23.7 | 2.7 |
7-zip 9.12 -mx | 97 | 222 | 9.8 |
nz 0.09 -cF | 99 | 7.8 | 5.7 |
uharc 0.6b -mx | 101 | 539 | 401 |
rar 4.2 -m5 | 143 | 50.5 | 5.2 |
gzip 1.3.3 -9 | 228 | 113 | 8.1 |
NanoZip outperforms other file archivers using a single thread only. The file archiver architecture allows parallel processing on multiple levels:
NanoZipLTCB is a subset of the NanoZip compression library to highlight the performance for compressing plaintext with a large memory model (multi-gigabyte). It compresses at the rate of 16 MB/s and decompresses at 32 MB/s on modern hardware with compression ratios over 6.2:1. No other file compressor compresses (and/or decompresses) faster at these compression ratios. It only accepts files from the large text compression benchmark.
nanozipltcb-0.09.linux64.zip (2010) 0a587667 2c9a497c 2b61338a 87ac98a0 80f484f7 eaa1317e 7fe6d995 3cd29e21
NanoZip has an original high-performance algorithm for computing the Burrows-Wheeler Transform (BWT). Below is a comparison of suffix array construction algorithms:
Algorithm | chr22.dna | etext99 | gcc-3.0.tar | howto | jdk13c | linux-2.4.5.tar | rctail96 | rfc | sprot34.dat | w3c2 | Total Seconds |
---|---|---|---|---|---|---|---|---|---|---|---|
Archon4r0 | 6.030 | 22.160 | 13.856 | 5.806 | 18.106 | 18.174 | 32.490 | 20.736 | 22.832 | 27.264 | 187.454 |
Deep-Shallow | 7.514 | 34.264 | 35.822 | 8.288 | 32.182 | 25.912 | 62.502 | 29.666 | 32.096 | 54.682 | 322.928 |
MSufSort3 | 7.132 | 24.106 | 14.952 | 5.672 | 11.314 | 19.890 | 21.060 | 17.936 | 23.352 | 17.090 | 162.504 |
divsufsort2 | 5.362 | 18.064 | 10.084 | 5.320 | 9.010 | 14.290 | 17.914 | 15.658 | 17.404 | 13.486 | 126.592 |
R08 | 5.985 | 13.823 | 14.533 | 4.034 | 8.268 | 18.121 | 15.225 | 16.728 | 15.735 | 12.750 | 125.202 |
The table shows approximated Manzini Corpus results based on Yuta Mori's timings for the latest suffix array construction algorithms. With the exception of MSufSort3, all algorithms work with similar space requirements. The R08 timings are adjusted by the ratio of MSufSort3 timings done with the same hardware as R08.
BWMonstr has the highest compression ratio amongst pure Burrows-Wheeler compression algorithms. It achieves the result of 203476 (2.1174 bpb) for book1 from the Calgary corpus, which is better than most PPM and CM compression algorithms. The program has the lowest known space requirements (~0.6N) for computing both BWT transform and post-transform compression. This is an unoptimized demo compressor (with command-line interface only). It is not intended for practical file compression purposes.
bwmonstr.002.win32.zip (2009) 77895735 0d7f1cc7 367ce2c8 154389b4 87b0e7d6 2046bff9 3273b538 b0b9b891
See detailed compression comparison.
Algorithm | BS99 | F07 | D05 | R08 |
---|---|---|---|---|
bib | 1.91 | 1.926 | 1.887 | 1.795 |
book1 | 2.27 | 2.356 | 2.264 | 2.147 |
book2 | 1.96 | 2.012 | 1.953 | 1.840 |
geo | 4.16 | 4.268 | 4.129 | 3.967 |
news | 2.42 | 2.464 | 2.397 | 2.268 |
obj1 | 3.73 | 3.765 | 3.692 | 3.584 |
obj2 | 2.45 | 2.433 | 2.411 | 2.226 |
paper1 | 2.41 | 2.439 | 2.390 | 2.274 |
paper2 | 2.36 | 2.387 | 2.329 | 2.230 |
pic | 0.72 | 0.753 | 0.714 | 0.688 |
progc | 2.45 | 2.476 | 2.422 | 2.307 |
progl | 1.68 | 1.697 | 1.660 | 1.576 |
progp | 1.68 | 1.702 | 1.666 | 1.579 |
trans | 1.46 | 1.488 | 1.451 | 1.354 |
Average bpb | 2.26 | 2.298 | 2.240 | 2.131 |
BS99: The best results of Balkenhol. [1]
F07: Fenwick's best results. [2]
D05: The best Deorowicz results. [3] Fenwick (2007) describes this as "the best Burrows-Wheeler result to date."
R08: This work (from 2008) is part of BWMonstr and NanoZip.
The current versions of both BWMonstr and NanoZip outperform R08.
[1] B. Balkenhol, Y. M. Shtarkov, "One attempt of a compression algorithm using the BWT", Faculty of Mathematics, University of Bielefeld, 1999.
[2] P. Fenwick, "Burrows-Wheeler Compression: Principles and Reflections." Theoretical Computer Science Vol 387 (2007) No. 3 pp 200-219.
[3] S. Deorowicz, "Context exhumation after the Burrows-Wheeler transform", Information Processing Letters, Vol 95, No 1, pp 313-320, 2005.
[4] Y. Mori, SACA Benchmarks.
For inquiries, you can reach out to Sami Runsas at sami.runsas@gmail.com.
Sami Runsas, the creator of NanoZip, passed away between 2013 and 2014 at the age of 30. His work on NanoZip and other compression algorithms has left a lasting impact on the field of data compression. We honor his contributions and remember him as a brilliant mind who pushed the boundaries of what was possible in software development.
"Sami, your work continues to inspire us. Rest in peace."