2010-03-29 01:07:08 +02:00
|
|
|
|
|
|
|
|
The first comparison is that of a linux kernel tarball (2.6.31). In all cases
|
|
|
|
|
the default options were used. 3 other common compression apps were used for
|
|
|
|
|
comparison, 7z which is an excellent all-round lzma based compression app,
|
|
|
|
|
gzip which is the benchmark fast standard that has good compression, and bzip2
|
|
|
|
|
which is the most common linux used compression.
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
In the following tables, lrzip means lrzip default options, lrzip -l means
|
|
|
|
|
lrzip using the lzo backend, lrzip -g means using the gzip backend,
|
|
|
|
|
lrzip -b means using the bzip2 backend and lrzip -z means using the zpaq
|
2010-03-29 01:07:08 +02:00
|
|
|
backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
linux-2.6.31.tar
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
These are benchmarks performed on a 3GHz quad core Intel Core2 with 8GB ram
|
|
|
|
|
using lrzip v0.42.
|
|
|
|
|
|
2010-03-29 01:07:08 +02:00
|
|
|
Compression Size Percentage Compress Decompress
|
|
|
|
|
None 365711360 100
|
|
|
|
|
7z 53315279 14.6 2m4.770s 0m5.360s
|
|
|
|
|
lrzip 52372722 14.3 2m48.477s 0m8.336s
|
2010-11-05 04:52:14 +01:00
|
|
|
lrzip -z 43455498 11.9 10m11.335 10m14.296
|
|
|
|
|
lrzip -l 112151676 30.7 0m14.913s 0m5.063s
|
|
|
|
|
lrzip -g 73476127 20.1 0m29.628s 0m5.591s
|
|
|
|
|
lrzip -b 60851152 16.6 0m43.539s 0m12.244s
|
2010-03-29 01:07:08 +02:00
|
|
|
bzip2 62416571 17.1 0m44.493s 0m9.819s
|
|
|
|
|
gzip 80563601 22.0 0m14.343s 0m2.781s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These results are interesting to note the compression of lrzip by default is
|
|
|
|
|
only slightly better than lzma, but at some cost in time at the compress and
|
|
|
|
|
decompress end of the spectrum. Clearly zpaq compression is much better than any
|
|
|
|
|
other compression algorithm by far, but the speed cost on both compression and
|
|
|
|
|
decompression is extreme. At this size compression, lzo is interesting because
|
|
|
|
|
it's faster than simply copying the file but only offers modest compression.
|
|
|
|
|
What lrzip offers at this end of the spectrum is extreme compression if
|
|
|
|
|
desired.
|
|
|
|
|
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
Let's take six kernel trees one version apart as a tarball, linux-2.6.31 to
|
|
|
|
|
linux-2.6.36. These will show lots of redundant information, but hundreds
|
2010-03-29 01:07:08 +02:00
|
|
|
of megabytes apart, which lrzip will be very good at compressing. For
|
|
|
|
|
simplicity, only 7z will be compared since that's by far the best general
|
|
|
|
|
purpose compressor at the moment:
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
These are benchmarks performed on a 2.53Ghz dual core Intel Core2 with 4GB ram
|
|
|
|
|
using lrzip v0.5.1. Note that it was running with a 32 bit userspace so only
|
|
|
|
|
2GB addressing was posible. However the benchmark was run with the -U option
|
|
|
|
|
allowing the whole file to be treated as one large compression window.
|
2010-03-29 01:07:08 +02:00
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
Tarball of 6 consecutive kernel trees.
|
2010-03-29 01:07:08 +02:00
|
|
|
|
|
|
|
|
Compression Size Percentage Compress Decompress
|
2010-11-05 04:52:14 +01:00
|
|
|
None 2373713920 100
|
|
|
|
|
7z 344088002 14.5 17m26s 1m22s
|
|
|
|
|
lrzip -U 73356070 3.1 08m53s 43s
|
|
|
|
|
lrzip -Ul 158851141 6.7 04m31s 35s
|
2010-03-29 01:07:08 +02:00
|
|
|
|
|
|
|
|
Things start getting very interesting now when lrzip is really starting to
|
2010-11-05 04:52:14 +01:00
|
|
|
shine. Note how it's not that much larger for 6 kernel trees than it was for
|
2010-03-29 01:07:08 +02:00
|
|
|
one. That's because all the similar data in both kernel trees is being
|
|
|
|
|
compressed as one copy and only the differences really make up the extra size.
|
|
|
|
|
All compression software does this, but not over such large distances. If you
|
|
|
|
|
copy the same data over multiple times, the resulting lrzip archive doesn't
|
|
|
|
|
get much larger at all.
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
|
2010-03-29 01:07:08 +02:00
|
|
|
Using the first example (linux-2.6.31.tar) and simply copying the data multiple
|
|
|
|
|
times over gives these results with lrzip(lzo):
|
|
|
|
|
|
|
|
|
|
Copies Size Compressed Compress Decompress
|
|
|
|
|
1 365711360 112151676 0m14.913s 0m5.063s
|
|
|
|
|
2 731422720 112151829 0m16.174s 0m6.543s
|
|
|
|
|
3 1097134080 112151832 0m17.466s 0m8.115s
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I had the amusing thought that this compression software could be used as a
|
2010-11-05 04:52:14 +01:00
|
|
|
bullshit detector if you were to compress people's speeches because if their
|
2010-03-29 01:07:08 +02:00
|
|
|
talks were full of catchphrases and not much actual content, it would all be
|
|
|
|
|
compressed down. So the larger the final archive, the less bullshit =)
|
|
|
|
|
|
|
|
|
|
Now let's move on to the other special feature of lrzip, the ability to
|
|
|
|
|
compress massive amounts of data on huge ram machines by using massive
|
|
|
|
|
compression windows. This is a 10GB virtual image of an installed operating
|
|
|
|
|
system and some basic working software on it. The default options on the
|
|
|
|
|
8GB machine meant that it was using a 5 GB window.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10GB Virtual image:
|
|
|
|
|
|
2010-11-05 04:52:14 +01:00
|
|
|
These benchmarks were done on the quad core with version 0.5.1
|
|
|
|
|
|
2010-03-29 01:07:08 +02:00
|
|
|
Compression Size Percentage Compress Time Decompress Time
|
|
|
|
|
None 10737418240 100.0
|
2010-11-04 14:16:18 +01:00
|
|
|
gzip 2772899756 25.8 05m47.35s 2m46.77s
|
2010-03-29 01:07:08 +02:00
|
|
|
bzip2 2704781700 25.2 20m34.269s 7m51.362s
|
|
|
|
|
xz 2272322208 21.2 58m26.829s 4m46.154s
|
|
|
|
|
7z 2242897134 20.9 29m28.152s 6m35.952s
|
2010-11-05 04:52:14 +01:00
|
|
|
lrzip 1354237684 12.6 29m13.402s 6m55.441s
|
|
|
|
|
lrzip -M 1079528708 10.1 23m44.226s 4m05.461s
|
|
|
|
|
lrzip -l 1793312108 16.7 05m13.246s 3m12.886s
|
|
|
|
|
lrzip -lM 1413268368 13.2 04m18.338s 2m54.650s
|
|
|
|
|
lrzip -z 1299844906 12.1 04h32m14s 04h33m
|
|
|
|
|
lrzip -zM 1066902006 9.9 04h07m14s 04h08m
|
2010-03-29 01:07:08 +02:00
|
|
|
|
2010-11-04 11:14:55 +01:00
|
|
|
|
2010-03-29 01:07:08 +02:00
|
|
|
At this end of the spectrum things really start to heat up. The compression
|
2010-11-04 14:16:18 +01:00
|
|
|
advantage is massive, with the lzo backend even giving much better results than
|
2010-11-05 04:52:14 +01:00
|
|
|
7z, and over a ridiculously short time. What appears to be a big disappointment
|
|
|
|
|
is actually zpaq here which takes more than 8 times longer than lzma for a
|
|
|
|
|
measly .2% improvement. The reason is that most of the advantage here is
|
|
|
|
|
achieved by the rzip first stage since there's a lot of redundant space over
|
|
|
|
|
huge distances on a virtual image. The -M option which works the memory
|
|
|
|
|
subsystem rather hard making noticeable impact on the rest of the machine also
|
|
|
|
|
does further wonders for the compression and times.
|
2010-03-29 01:07:08 +02:00
|
|
|
|
|
|
|
|
This should help govern what compression you choose. Small files are nicely
|
|
|
|
|
compressed with zpaq. Intermediate files are nicely compressed with lzma.
|
|
|
|
|
Large files get excellent results even with lzo provided you have enough ram.
|
|
|
|
|
(Small being < 100MB, intermediate <1GB, large >1GB).
|
|
|
|
|
Or, to make things easier, just use the default settings all the time and be
|
|
|
|
|
happy as lzma gives good results. :D
|
|
|
|
|
|
|
|
|
|
Con Kolivas
|
2010-11-05 04:52:14 +01:00
|
|
|
Tue, 5th Nov 2010
|