mirror of
https://github.com/ckolivas/lrzip.git
synced 2025-12-06 07:12:00 +01:00
accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
123 lines
5.5 KiB
Plaintext
123 lines
5.5 KiB
Plaintext
These are benchmarks performed on a 3GHz quad core Intel Core2 with 8GB ram
|
|
using lrzip v0.42.
|
|
|
|
The first comparison is that of a linux kernel tarball (2.6.31). In all cases
|
|
the default options were used. 3 other common compression apps were used for
|
|
comparison, 7z which is an excellent all-round lzma based compression app,
|
|
gzip which is the benchmark fast standard that has good compression, and bzip2
|
|
which is the most common linux used compression.
|
|
|
|
In the following tables, lrzip means lrzip default options, lrzip(lzo) means
|
|
lrzip using the lzo backend, lrzip(gzip) means using the gzip backend,
|
|
lrzip(bzip2) means using the bzip2 backend and lrzip(zpaq) means using the zpaq
|
|
backend.
|
|
|
|
|
|
linux-2.6.31.tar
|
|
|
|
Compression Size Percentage Compress Decompress
|
|
None 365711360 100
|
|
7z 53315279 14.6 2m4.770s 0m5.360s
|
|
lrzip 52372722 14.3 2m48.477s 0m8.336s
|
|
lrzip(zpaq) 43455498 11.9 10m11.335 10m14.296
|
|
lrzip(lzo) 112151676 30.7 0m14.913s 0m5.063s
|
|
lrzip(gzip) 73476127 20.1 0m29.628s 0m5.591s
|
|
lrzip(bzip2) 60851152 16.6 0m43.539s 0m12.244s
|
|
bzip2 62416571 17.1 0m44.493s 0m9.819s
|
|
gzip 80563601 22.0 0m14.343s 0m2.781s
|
|
|
|
|
|
These results are interesting to note the compression of lrzip by default is
|
|
only slightly better than lzma, but at some cost in time at the compress and
|
|
decompress end of the spectrum. Clearly zpaq compression is much better than any
|
|
other compression algorithm by far, but the speed cost on both compression and
|
|
decompression is extreme. At this size compression, lzo is interesting because
|
|
it's faster than simply copying the file but only offers modest compression.
|
|
What lrzip offers at this end of the spectrum is extreme compression if
|
|
desired.
|
|
|
|
|
|
Let's take two kernel trees one version apart as a tarball, linux-2.6.31 and
|
|
linux-2.6.32-rc8. These will show lots of redundant information, but hundreds
|
|
of megabytes apart, which lrzip will be very good at compressing. For
|
|
simplicity, only 7z will be compared since that's by far the best general
|
|
purpose compressor at the moment:
|
|
|
|
|
|
Tarball of two kernel trees, one version apart.
|
|
|
|
Compression Size Percentage Compress Decompress
|
|
None 749066240 100
|
|
7z 108710624 14.5 4m4.260s 0m11.133s
|
|
lrzip 57943094 7.7 3m08.788s 0m10.747s
|
|
lrzip(lzo) 124029899 16.6 0m18.997s 0m7.107s
|
|
|
|
Things start getting very interesting now when lrzip is really starting to
|
|
shine. Note how it's not that much larger for 2 kernel trees than it was for
|
|
one. That's because all the similar data in both kernel trees is being
|
|
compressed as one copy and only the differences really make up the extra size.
|
|
All compression software does this, but not over such large distances. If you
|
|
copy the same data over multiple times, the resulting lrzip archive doesn't
|
|
get much larger at all.
|
|
|
|
Using the first example (linux-2.6.31.tar) and simply copying the data multiple
|
|
times over gives these results with lrzip(lzo):
|
|
|
|
Copies Size Compressed Compress Decompress
|
|
1 365711360 112151676 0m14.913s 0m5.063s
|
|
2 731422720 112151829 0m16.174s 0m6.543s
|
|
3 1097134080 112151832 0m17.466s 0m8.115s
|
|
|
|
|
|
I had the amusing thought that this compression software could be used as a
|
|
bullshit detector if you were to compress peoples' speeches because if their
|
|
talks were full of catchphrases and not much actual content, it would all be
|
|
compressed down. So the larger the final archive, the less bullshit =)
|
|
|
|
Now let's move on to the other special feature of lrzip, the ability to
|
|
compress massive amounts of data on huge ram machines by using massive
|
|
compression windows. This is a 10GB virtual image of an installed operating
|
|
system and some basic working software on it. The default options on the
|
|
8GB machine meant that it was using a 5 GB window.
|
|
|
|
|
|
10GB Virtual image:
|
|
|
|
Compression Size Percentage Compress Time Decompress Time
|
|
None 10737418240 100.0
|
|
gzip 2772899756 25.8 7m52.667s 4m8.661s
|
|
bzip2 2704781700 25.2 20m34.269s 7m51.362s
|
|
xz 2272322208 21.2 58m26.829s 4m46.154s
|
|
7z 2242897134 20.9 29m28.152s 6m35.952s
|
|
lrzip* 1354237684 12.6 29m13.402s 6m55.441s
|
|
lrzip(lzo)* 1828073980 17.0 3m34.816s 5m06.266s
|
|
lrzip(zpaq) 1341008779 12.5 4h11m14s
|
|
lrzip(zpaq)M 1270134391 11.8 4h30m14
|
|
lrzip(zpaq)MW 1066902006 9.9
|
|
|
|
(The benchmarks with * were done with version 0.5)
|
|
|
|
At this end of the spectrum things really start to heat up. The compression
|
|
advantage is massive, with the lzo backend even giving much better results
|
|
than 7z, and over a ridiculously short time. Note that it's not much longer
|
|
than it takes to just *read* a 10GB file. Unfortunately at these large
|
|
compression windows, the decompression time is significantly longer, but
|
|
it's a fair tradeoff I believe :) What appears to be a big disappointment is
|
|
actually zpaq here which takes more than 8 times longer than lzma for a measly
|
|
.2% improvement. The reason is that most of the advantage here is achieved by
|
|
the rzip first stage. The -M option was included here for completeness to see
|
|
what the maximum possible compression was for this file on this machine, while
|
|
the MW run was with the options -W 200 (to make the window larger than the
|
|
file and the ram the machine has), and it still completed but induced a lot
|
|
of swap in the interim.
|
|
|
|
This should help govern what compression you choose. Small files are nicely
|
|
compressed with zpaq. Intermediate files are nicely compressed with lzma.
|
|
Large files get excellent results even with lzo provided you have enough ram.
|
|
(Small being < 100MB, intermediate <1GB, large >1GB).
|
|
Or, to make things easier, just use the default settings all the time and be
|
|
happy as lzma gives good results. :D
|
|
|
|
Con Kolivas
|
|
Tue, 2nd Nov 2010
|