lrzip/man/lrzip.1

330 lines
13 KiB
Groff
Raw Normal View History

.TH "lrzip" "1" "February 2011" "" ""
2010-03-29 01:07:08 +02:00
.SH "NAME"
lrzip \- a large-file compression program
.SH "SYNOPSIS"
.PP
lrzip [OPTIONS] <file>
.br
lrzip \-d [OPTIONS] <file>
.br
lrunzip [OPTIONS] <file>
.br
lrztar [lrzip options] <directory>
.br
lrztar \-d [lrzip options] <directory>
.br
lrzuntar [lrzip options] <directory>
.br
2010-03-29 01:07:08 +02:00
LRZIP=NOCONFIG [lrzip|lrunzip] [OPTIONS] <file>
.PP
.SH "DESCRIPTION"
.PP
LRZIP is a file compression program designed to do particularly
well on very large files containing long distance redundancy\&.
lrztar is a wrapper for LRZIP to simplify compression and decompression
of directories.
.PP
.SH "OPTIONS SUMMARY"
.PP
Here is a summary of the options to lrzip\&.
.nf
\-w size compression window in hundreds of MB
default chosen by heuristic dependent on ram and chosen compression
\-d decompress
\-o filename specify the output file name and/or path
\-O directory specify the output directory when \-o is not used
\-S suffix specify compressed suffix (default '.lrz')
\-f force overwrite of any existing files
\-D delete existing files
\-q don't show compression progress
\-L level set rzip/lzma/bzip2/gzip compression level (1\-9, default 7)
2010-03-29 01:07:08 +02:00
\-n no backend compression. Prepare for other compressor
\-l lzo compression (ultra fast)
\-b bzip2 compression
\-g gzip compression using zlib
\-z zpaq compression (best, extreme compression, extremely slow)
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
\-M Maximum window (all available ram)
2010-11-05 14:10:57 +01:00
\-U Use unlimited window size beyond ramsize (potentially much slower)
2010-03-29 01:07:08 +02:00
\-T value Compression threshold with LZO test. (0 (nil) - 10 (high), default 1)
\-N value Set nice value to value (default 19)
\-p value Set processor count to override number of threads
\-v[v] Verbose. Multiple invocations Increase verbosity
2010-03-29 01:07:08 +02:00
\-V show version
\-t test compressed file integrity
\-i show compressed file information
2011-02-20 13:22:45 +01:00
\-H display md5 hash integrity information
\-c check integrity of file written on decompression
\-k keep broken or damaged files
2010-03-29 01:07:08 +02:00
If no filenames or "-" is specified, stdin/out will be used (stdin/out is
inefficient with lrzip and not recommended usage).
2010-03-29 01:07:08 +02:00
.fi
.PP
.SH "OPTIONS"
.PP
.IP "\fB-h\fP"
Print an options summary page
.IP
.IP "\fB-V\fP"
Print the lrzip version number
.IP
.IP "\fB-v[v]\fP"
Increases verbosity. \-vv will print more messages than \-v.
.IP
.IP "\fB-w n\fP"
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
Set the maximum allowable compression window size to n in hundreds of megabytes.
This is the amount of memory lrzip will search during its first stage of
pre-compression and is the main thing that will determine how much benefit lrzip
will provide over ordinary compression with the 2nd stage algorithm. If not set
(recommended), the value chosen will be determined by an internal heuristic in
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
lrzip which uses the most memory that is reasonable, without any hard upper
limit. It is limited to 2GB on 32bit machines. lrzip will always reduce the
window size to the biggest it can be without running out of memory.
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-L 1\&.\&.9\fP"
Set the compression level from 1 to 9. The default is to use level 7, which
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
gives good all round compression. The compression level is also strongly related
to how much memory lrzip uses. See the \-w option for details.
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-M \fP"
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
Maximum window size\&. If this option is set, then lrzip tries to load the
entire file into ram as one big compression window, and will reduce the size of
the window until it does fit. This may induce a hefty swap load on your machine
but can also give dramatic size advantages when your file is the size of your
ram or larger.
.IP
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
.IP "\fB-U \fP"
Unlimited window size\&. If this option is set, and the file being compressed
does not fit into the available ram, lrzip will use a moving second buffer as a
"sliding mmap" which emulates having infinite ram. This will provide the most
possible compression in the first rzip stage which can improve the compression
of ultra large files when they're bigger than the available ram. However it runs
progressively slower the larger the difference between ram and the file size so
it is worth trying the \-M option first to see if the whole file can be accessed
in one pass, and then if not, it should be used together with the \-M option (if
at all).
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-T 0\&.\&.10\fP"
Sets the LZO compression threshold when testing a data chunk when slower
compression is used. The threshold level can be from 0 to 10.
This option is used to speed up compression by avoiding doing the slow
compression pass. The reasoning is that if it is completely incompressible
by LZO then it will also be incompressible by them, thereby saving time.
The default is 1.
.IP
.IP "\fB-d\fP"
Decompress. If this option is not used then lrzip looks at
the name used to launch the program. If it contains the string
"lrunzip" then the \-d option is automatically set.
.IP
.IP "\fB-l\fP"
LZO Compression. If this option is set then lrzip will use the ultra
fast lzo compression algorithm for the 2nd stage. This mode of compression
gives bzip2 like compression at the speed it would normally take to simply
copy the file, giving excellent compression/time value.
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-n\fP"
No 2nd stage compression. If this option is set then lrzip will only
perform the long distance redundancy 1st stage compression. While this does
not compress any faster than LZO compression, it produces a smaller file
that then responds better to further compression (by eg another application),
also reducing the compression time substantially.
.IP
.IP "\fB-b\fP"
Bzip2 compression. Uses bzip2 compression for the 2nd stage, much like
the original rzip does.
.IP "\fB-g\fP"
Gzip compression. Uses gzip compression for the 2nd stage. Uses libz compress
and uncompress functions.
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-z\fP"
ZPAQ compression. Uses ZPAQ compression which is from the PAQ family of
compressors known for having some of the highest compression ratios possible
but at the cost of being extremely slow on both compress and decompress (4x
slower than lzma which is the default).
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-o\fP"
Set the output file name. If this option is not set then
the output file name is chosen based on the input name and the
suffix. The \-o option cannot be used if more than one file name is
specified on the command line.
.IP
.IP "\fB-O\fP"
Set the output directory for the default filename. This option
cannot be combined with \-o.
.IP
.IP "\fB-S\fP"
Set the compression suffix. The default is '.lrz'.
.IP
.IP "\fB-f\fP"
If this option is not specified (Default) then lrzip will not
overwrite any existing files. If you set this option then rzip will
silently overwrite any files as needed.
.IP
.IP "\fB-D\fP"
If this option is specified then lrzip will delete the
source file after successful compression or decompression. When this
option is not specified then the source files are not deleted.
.IP
.IP "\fB-q\fP"
If this option is specified then lrzip will not show the
percentage progress while compressing. Note that compression happens in
bursts with lzma compression which is the default compression. This means
that it will progress very rapidly for short periods and then stop for
long periods.
.IP "\fB-N value\fP"
The default nice value is 19. This option can be used to set the priority
scheduling for the lrzip backup or decompression. Valid nice values are
from \-20 to 19. Note this does NOT speed up or slow down compression.
2010-03-29 01:07:08 +02:00
.IP
.IP "\fB-p value\fP"
Set the number of processor count to determine the number of threads to run.
Normally lrzip will scale according to the number of CPUs it detects. Using
this will override the value in case you wish to use less CPUs to either
decrease the load on your machine, or to improve compression. Setting it to
1 will maximise compression but will not attempt to use more than one CPU.
.IP
2010-03-29 01:07:08 +02:00
.IP "\fB-t\fP"
This tests the compressed file integrity. It does this by decompressing it
to a temporary file and then deleting it.
.IP
.IP "\fB-i\fP"
This shows information about a compressed file. It shows the compressed size,
2011-02-20 13:22:45 +01:00
the decompressed size, the compression ratio, what compression was used and
what hash checking will be used for internal integrity checking.
2010-03-29 01:07:08 +02:00
Note that the compression mode is detected from the first block only and
it will show no compression used if the first block was incompressible, even
2011-02-20 13:22:45 +01:00
if later blocks were compressible.
.IP
.IP "\fB-H\fP"
This shows the md5 hash value calculated on compressing or decompressing an
lrzip archive. By default all compression has the md5 value calculated and
stored in all archives since version 0.560. On decompression, when an md5
value has been found, it will be calculated and used for integrity checking.
If the md5 value is not stored in the archive, it will not be calcuated unless
explicitly specified with this option, or check integrity (see below) has been
requested.
.IP
.IP "\fB-c\fP"
This option enables integrity checking of the file written to disk on
decompression. All decompression is tested internally in lrzip with either
crc32 or md5 hash checking depending on the version of the archive already.
However the file written to disk may be corrupted for other reasons to do with
other userspace problems such as faulty library versions, drivers, hardware
failure and so on. Enabling this option will make lrzip perform an md5 hash
check on the file that's written to disk. When the archive has the md5 value
stored in it, it is compared to this. Otherwise it is compard to the value
calculated during decompression. This offers an extra guarantee that the file
written is the same as the original archived.
.IP
.IP "\fB-k\fP"
This option will keep broken or damaged files instead of deleting them.
When compression or decompression is interrupted either by user or error, or
a file decompressed fails an integrity check, it is normally deleted by LRZIP.
2010-03-29 01:07:08 +02:00
.IP
.PP
.SH "INSTALLATION"
.PP
"make install" or just install lrzip somewhere in your search path.
.PP
.SH "COMPRESSION ALGORITHM"
.PP
Huge rewrite of buffer reading in rzip.c. We use a wrapper instead of accessing the buffer directly, thus allowing us to have window sizes larger than available ram. This is implemented through the use of a "sliding mmap" implementation. Sliding mmap uses two mmapped buffers, one large one as previously, and one page sized smaller one. When an attempt is made to read beyond the end of the large buffer, the small buffer is remapped to the file area that's being accessed. While this implementation is 100x slower than direct mmapping, it allows us to implement unlimited sized compression windows. Implement the -U option with unlimited sized windows. Rework the selection of compression windows. Instead of trying to guess how much ram the machine might be able to access, we try to safely buffer as much ram as we can, and then use that to determine the file buffer size. Do not choose an arbitrary upper window limit unless -w is specified. Rework the -M option to try to buffer the entire file, reducing the buffer size until we succeed. Align buffer sizes to page size. Clean up lots of unneeded variables. Fix lots of minor logic issues to do with window sizes accepted/passed to rzip and the compression backends. More error handling. Change -L to affect rzip compression level directly as well as backend compression level and use 9 by default now. More cleanups of information output. Use 3 point release numbering in case one minor version has many subversions. Numerous minor cleanups and tidying. Updated docs and manpages.
2010-11-04 11:14:55 +01:00
LRZIP operates in two stages. The first stage finds and encodes large chunks of
duplicated data over potentially very long distances in the input file. The
second stage is to use a compression algorithm to compress the output of the
first stage. The compression algorithm can be chosen to be optimised for extreme
size (zpaq), size (lzma - default), speed (lzo), legacy (bzip2 or gzip) or can
be omitted entirely doing only the first stage. A one stage only compressed file
can almost always improve both the compression size and speed done by a
subsequent compression program.
2010-03-29 01:07:08 +02:00
.PP
The key difference between lrzip and other well known compression
algorithms is its ability to take advantage of very long distance
redundancy. The well known deflate algorithm used in gzip uses a
maximum history buffer of 32k. The block sorting algorithm used in
bzip2 is limited to 900k of history. The history buffer in lrzip can be
any size long, not even limited by available ram.
2010-03-29 01:07:08 +02:00
.
.PP
It is quite common these days to need to compress files that contain
long distance redundancies. For example, when compressing a set of
home directories several users might have copies of the same file, or
of quite similar files. It is also common to have a single file that
contains large duplicated chunks over long distances, such as pdf
files containing repeated copies of the same image. Most compression
programs won't be able to take advantage of this redundancy, and thus
might achieve a much lower compression ratio than lrzip can achieve.
.IP
.PP
.SH "FILES"
.PP
LRZIP recognises a configuration file that contains default settings.
2010-03-29 01:07:08 +02:00
This configuration is searched for in the current directory, /etc/lrzip,
and $HOME/.lrzip. The configuration filename must be \fBlrzip.conf\fP.
.PP
.SH "ENVIRONMENT"
By default, lrzip will search for and use a configuration file, lrzip.conf.
If the user wishes to bypass the file, a startup ENV variable may be set.
.br
.B LRZIP =
.I "NOCONFIG "
.B "[lrzip|lrunzip]"
[OPTIONS] <file>
.br
which will force lrzip to ignore the configuration file.
.PP
.SH "HISTORY - Notes on rzip by Andrew Tridgell"
.PP
The ideas behind rzip were first implemented in 1998 while I was
working on rsync. That version was too slow to be practical, and was
replaced by this version in 2003.
LRZIP was created by the desire to have better compression and/or speed
by Con Kolivas on blending the lzma and lzo compression algorithms with
the rzip first stage, and extending the compression windows to scale
with increasing ram sizes.
.PP
.SH "BUGS"
.PP
2011-02-20 13:22:45 +01:00
Nil known.
2010-03-29 01:07:08 +02:00
.PP
.SH "SEE ALSO"
lrzip.conf(5),
bzip2(1),
gzip(1),
lzop(1),
lrzip(1),
rzip(1),
zip(1)
lrztar(1),
lrzuntar(1)
2010-03-29 01:07:08 +02:00
.PP
.SH "AUTHOR and CREDITS"
.br
rzip was written by Andrew Tridgell.
.br
lzma was written by Igor Pavlov.
.br
lzo was written by Markus Oberhumer.
.br
zpaq was written by Matt Mahoney.
.br
lrzip was bastardised from rzip by Con Kolivas.
.br
Peter Hyman added informational output, updated LZMA SDK,
and added multi-threading capabilities.
2010-03-29 01:07:08 +02:00
.PP
If you wish to report a problem, or make a suggestion, then please email Con at
2010-03-29 01:07:08 +02:00
kernel@kolivas.org
.PP
lrzip is released under the GNU General Public License version 2.
Please see the file COPYING for license details.