NanoZip

High-performance file compression software

About NanoZip

NanoZip is an experimental file archiver software that incorporates several original file compression algorithms. It is designed for high data compression efficiency and includes many experimental features, such as fine-grained parallel compression algorithms.

Download

The latest version of NanoZip (2011) is available for 32-bit and 64-bit Windows and Linux systems. The Windows version includes both a graphical user interface and a command-line interface, while the Linux versions are command-line only.

Performance

NanoZip offers excellent compression performance across a wide range of file types. Below are some performance comparisons for different types of data:

Linux Binary Distribution (500 MB)

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cc 73 724 722
nz 0.09 -cO 83 164 37
nz 0.09 -co 91 67 10
uharc 0.6b -mx 96 448 363
7-zip 9.12 -mx 99 187 9
nz 0.09 -cD 117 23 2
rar 4.2 -m5 117 41 4
nz 0.09 -cd 131 6.5 1.9
gzip 1.3.3 -9 159 84 5
nz 0.09 -cf 169 1.9 2
gzip 1.3.3 -3 170 15 5

The '-cO' algorithm in NanoZip is the strongest known asymmetric compression algorithm. No other algorithm decompresses faster at this compression ratio. See thorough compression comparison with other file compressors.

Audio Compression

NanoZip has special algorithms for compressing audio data. Below is a comparison of NanoZip with other popular audio compressors:

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cd 123 9.3 7.2
flac 1.2.1 -8 124 28 3.4
nz 0.09 -cf 128 3.1 3.1
flac 1.2.1 -1 134 4.1 3.2

The above results are with NanoZip multithreading disabled.

Text Compression

Much of the original work in NanoZip is built around text compression. Below is a comparison of NanoZip with other text compressors:

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cc 19.6 112 111
nz 0.09 -cO 21.0 8.3 5.3
nz 0.09 -co 21.7 4.5 2.5
7-zip 9.12 -mx 27.5 117 2.2
bzip2 1.0.5 28.2 11.8 5.3
nz 0.09 -cD 28.5 2.9 0.5
rar 4.2 -m5 31.1 29.7 0.9
gzip 1.3.3 -9 37.9 12.7 1.16

Chess Compression

NanoZip understands chess game notation (1. e4 e5...) and outperforms on such data. Below is a comparison of NanoZip with other compressors for a 100 MB chess file:

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cO 11.8 10.7 3.1
nz 0.09 -co 12.7 5.7 1.9
7-zip 9.12 -mx 19.5 94.2 1.8
bzip2 1.0.5 19.9 14.8 4.5
rar 4.2 -m5 23.9 30.8 0.7
nz 0.09 -cd 24.0 1.1 0.5
gzip 1.3.3 -9 29.8 13.3 1.0

No illegal moves are checked nor is there an integrated chess engine, hence the results could be improved.

Multimedia Compression

NanoZip algorithms handle linear sequences (e.g. 'abcdef...', 'x0u1a2p3...') embedded in heterogeneous data. Below is a comparison for a 700 MB example:

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cc 126 1213 1174
nz 0.09 -cO 128 507 120
uharc 0.6b -mx 161 698 572
nz 0.09 -co 162 117 18
7-zip 9.12 -mx 162 154 16
nz 0.09 -cDP 165 52 5
nz 0.09 -cD 188 21 2.9
rar 4.2 -m5 198 51.5 6.7
nz 0.09 -cd 213 10.5 2.6
bzip2 1.0.5 240 91 36
gzip 1.3.3 -9 247 162 7.3
nz 0.09 -cf 255 2.88 2.77
lzop 1.03 -1 343 6.27 2.16

Parallel Compression Algorithms

NanoZip's algorithms are memory-efficient and recognize similarities between data blocks, even if they are far apart. This effect is amplified in the parallel compression algorithms. Below is an example of compressing 800 MB of compiler binaries using 500 MB of memory:

Compressor Compressed Size (MB) Compression Time (s) Decompression Time (s)
nz 0.09 -cO 43 351 32
nz 0.09 -cdP 65 23.7 2.7
7-zip 9.12 -mx 97 222 9.8
nz 0.09 -cF 99 7.8 5.7
uharc 0.6b -mx 101 539 401
rar 4.2 -m5 143 50.5 5.2
gzip 1.3.3 -9 228 113 8.1

Archiver Architecture

NanoZip outperforms other file archivers using a single thread only. The file archiver architecture allows parallel processing on multiple levels:

  1. Independent threads for file reading and writing.
  2. The compression algorithms are designed in such a way that parts of the compression can be done ahead, and other threads finish or complement the compression that was begun earlier.
  3. Some algorithms (depending on the data content) run with full CPU utilization regardless of the number of processors available.
  4. High-level archiver architecture allows the entire process to be run in multiple branches (controlled by the '-p' switch) by splitting the input task into any arbitrary number of blocks.

NanoZipLTCB

NanoZipLTCB is a subset of the NanoZip compression library to highlight the performance for compressing plaintext with a large memory model (multi-gigabyte). It compresses at the rate of 16 MB/s and decompresses at 32 MB/s on modern hardware with compression ratios over 6.2:1. No other file compressor compresses (and/or decompresses) faster at these compression ratios. It only accepts files from the large text compression benchmark.

nanozipltcb-0.09.linux64.zip (2010) 0a587667 2c9a497c 2b61338a 87ac98a0 80f484f7 eaa1317e 7fe6d995 3cd29e21

Suffix Sorting

NanoZip has an original high-performance algorithm for computing the Burrows-Wheeler Transform (BWT). Below is a comparison of suffix array construction algorithms:

Algorithm chr22.dna etext99 gcc-3.0.tar howto jdk13c linux-2.4.5.tar rctail96 rfc sprot34.dat w3c2 Total Seconds
Archon4r0 6.030 22.160 13.856 5.806 18.106 18.174 32.490 20.736 22.832 27.264 187.454
Deep-Shallow 7.514 34.264 35.822 8.288 32.182 25.912 62.502 29.666 32.096 54.682 322.928
MSufSort3 7.132 24.106 14.952 5.672 11.314 19.890 21.060 17.936 23.352 17.090 162.504
divsufsort2 5.362 18.064 10.084 5.320 9.010 14.290 17.914 15.658 17.404 13.486 126.592
R08 5.985 13.823 14.533 4.034 8.268 18.121 15.225 16.728 15.735 12.750 125.202

The table shows approximated Manzini Corpus results based on Yuta Mori's timings for the latest suffix array construction algorithms. With the exception of MSufSort3, all algorithms work with similar space requirements. The R08 timings are adjusted by the ratio of MSufSort3 timings done with the same hardware as R08.

BWMonstr

BWMonstr has the highest compression ratio amongst pure Burrows-Wheeler compression algorithms. It achieves the result of 203476 (2.1174 bpb) for book1 from the Calgary corpus, which is better than most PPM and CM compression algorithms. The program has the lowest known space requirements (~0.6N) for computing both BWT transform and post-transform compression. This is an unoptimized demo compressor (with command-line interface only). It is not intended for practical file compression purposes.

bwmonstr.002.win32.zip (2009) 77895735 0d7f1cc7 367ce2c8 154389b4 87b0e7d6 2046bff9 3273b538 b0b9b891

See detailed compression comparison.

Algorithm BS99 F07 D05 R08
bib 1.91 1.926 1.887 1.795
book1 2.27 2.356 2.264 2.147
book2 1.96 2.012 1.953 1.840
geo 4.16 4.268 4.129 3.967
news 2.42 2.464 2.397 2.268
obj1 3.73 3.765 3.692 3.584
obj2 2.45 2.433 2.411 2.226
paper1 2.41 2.439 2.390 2.274
paper2 2.36 2.387 2.329 2.230
pic 0.72 0.753 0.714 0.688
progc 2.45 2.476 2.422 2.307
progl 1.68 1.697 1.660 1.576
progp 1.68 1.702 1.666 1.579
trans 1.46 1.488 1.451 1.354
Average bpb 2.26 2.298 2.240 2.131

BS99: The best results of Balkenhol. [1]
F07: Fenwick's best results. [2]
D05: The best Deorowicz results. [3] Fenwick (2007) describes this as "the best Burrows-Wheeler result to date."
R08: This work (from 2008) is part of BWMonstr and NanoZip.

The current versions of both BWMonstr and NanoZip outperform R08.

[1] B. Balkenhol, Y. M. Shtarkov, "One attempt of a compression algorithm using the BWT", Faculty of Mathematics, University of Bielefeld, 1999.
[2] P. Fenwick, "Burrows-Wheeler Compression: Principles and Reflections." Theoretical Computer Science Vol 387 (2007) No. 3 pp 200-219.
[3] S. Deorowicz, "Context exhumation after the Burrows-Wheeler transform", Information Processing Letters, Vol 95, No 1, pp 313-320, 2005.
[4] Y. Mori, SACA Benchmarks.

Contact

For inquiries, you can reach out to Sami Runsas at sami.runsas@gmail.com.

In Memory of Sami Runsas

Sami Runsas, the creator of NanoZip, passed away between 2013 and 2014 at the age of 30. His work on NanoZip and other compression algorithms has left a lasting impact on the field of data compression. We honor his contributions and remember him as a brilliant mind who pushed the boundaries of what was possible in software development.

"Sami, your work continues to inspire us. Rest in peace."