Change way stream blocks are computed to maximize compression.

* Add Dictionary Size computation and setting capability.
* Remove 7/9 scaling for lzma levels. Default still 5.
* LZMA levels now 1-9.
* Improve info displays.
* Assembler will now select either nasm or yasm.
* ETA will be displayed after each chunk is processed.
* Decompression status updated every 5 seconds or so.
* Documentation updates.
* Code cleanups (whitespace removal, comment alignment).
This commit is contained in:
Peter Hyman 2019-12-17 07:22:36 -06:00
parent 4f1adeaec4
commit 2979c8ec26
19 changed files with 264 additions and 121 deletions

View file

@ -1,5 +1,23 @@
lrzip ChangeLog
DECEMBER 2019, version bump 0.640-beta, Peter Hyman
* Add Dictionary Size capability.
* Pass absolute dictionary to lzma compressor since lrzip determines size.
* Pass -1 for threading to lzma encoder so it can choose multi-thread or not.
* Update lrzip.conf to support dictionary size.
* Update setup_overhead() in util.c to handle variable dictionary sizes.
* Update open_stream_out() in stream.c to maximize use of memory and threads.
* Update info display to show actual lzma settings, lc, lp, pb, and
dictionary size.
* Show ETA for each pass of compression (fixed logic in rzip.c).
* Display decompression status every 5 seconds instead of multiples of 10%.
* lrzip can now use nasm or yasm assemblers. AC_CHECK_PROGS in configure.ac.
* LZMA can now use all levels 1-9. Removed 7/9 scaling from early SDK. LZMA
default still 5 and will display on summary screen.
* Update man pages.
* Some code cleanups (align comments, whitespace removal).
NOVEMBER 2019, updates to version 0.631, Peter Hyman
* Fixups to Assembler code in configure.ac lzma/C/Makefile.am by using

4
TODO
View file

@ -1,5 +1,9 @@
MAYBE TODO for lrzip program
Update LZMA SDK to 19.00+ (in progress). Includes some improvements
especially in regards to memory and decompression speed. Includes
a multi threaded decompressor and increased compression speed and ratios.
Upgrade to newer version of zpaq supporting 3 compression levels without
relying on open_memstream so it works without temporary files on apple.

View file

@ -1,3 +1,13 @@
lrzip-0.640-beta
Dictionary Size can be set from command line or conf file.
Memory and block size computations now favor larger blocks
and larger dictionary sizes for better pre-processing and to give
compression backends more to work with.
Removal of scaling for lzma levels.
Larger blocks in lrzip files may result in faster compression and
decompression times and smaller output files.
lrzip-0.631
Assembler code is back and works with x86_64

View file

@ -2,7 +2,7 @@
##--##--##--##--##--##--##--##--##--##--##--##--##--##--##--##--##
m4_define([v_maj], [0])
m4_define([v_min], [6])
m4_define([v_mic], [31])
m4_define([v_mic], [40beta])
##--##--##--##--##--##--##--##--##--##--##--##--##--##--##--##--##
m4_define([v_v], m4_join([], v_min, v_mic))
m4_define([v_ver], [v_maj.v_v])

View file

@ -7,7 +7,7 @@ Assembler is enabled by
and disabled by
./configure --disable-asm
not
ASM=no ./configure
ASM=no ./configure
New files replace 32 and 64 bit assembler code.
fixes to lzma/C/Makefile.am permit libtool linking.
@ -38,7 +38,7 @@ which will automatically not include the asm module or
change the line
ASM_OBJ=7zCrcT8.o 7zCrcT8U.o
to
to
ASM_OBJ=7zCrc.o
in Makefile. This will change the dependency tree.

View file

@ -88,7 +88,7 @@ explain.
lzo testing for incompressible data...OK for chunk 43408.
Compressed size = 52.58% of chunk, 1 Passes
Progress percentage pausing during lzma compression...
lzo testing for incompressible data...FAILED - below threshold for chunk 523245383.
lzo testing for incompressible data...FAILED - below threshold for chunk 523245383.
Compressed size = 98.87% of chunk, 50 Passes
This was for a video .VOB file of 1GB. A compression threshold of 2 was used.

View file

@ -2,30 +2,41 @@
# anything beginning with a # or whitespace will be ignored
# valid parameters are separated with an = and a value
# parameters and values are not case sensitive except where specified
#
#
# lrzip 0.24+, peter hyman, pete@peterhyman.com
# ignored by earlier versions.
# Compression Window size in 100MB. Normally selected by program. (-w)
# WINDOW = 20
# Compression Level 1-9 (7 Default). (-L)
# COMPRESSIONLEVEL = 7
# Use -U setting, Unlimited ram. Yes or No
# UNLIMITED = NO
# Compression Method, rzip, gzip, bzip2, lzo, or lzma (default), or zpaq. (-n -g -b -l --lzma -z)
# May be overriden by command line compression choice.
# COMPRESSIONMETHOD = lzma
# Perform LZO Test. Default = YES (-T )
# LZMA Dictionary Size. 0 = default value used. 12-30 = 2^ds (--dictsize)
# DICTIONARYSIZE = 0
# Perform LZO Test. Default = YES (-T)
# LZOTEST = NO
# Hash Check on decompression, (-c)
# HASHCHECK = YES
# Show HASH value on Compression even if Verbose is off, YES (-H)
# SHOWHASH = YES
# Default output directory (-O)
# OUTPUTDIRECTORY = location
# Verbosity, YES or MAX (v, vv)
# VERBOSITY = max
# Show Progress as file is parsed, YES or no (NO = -q option)
# SHOWPROGRESS = YES
@ -38,14 +49,12 @@
# Delete source file after compression (-D)
# this parameter and value are case sensitive
# value must be YES to activate
# DELETEFILES = NO
# Replace existing lrzip file when compressing (-f)
# this parameter and value are case sensitive
# value must be YES to activate
# REPLACEFILE = YES
# REPLACEFILE = YES
# Override for Temporary Directory. Only valid when stdin/out or Test is used
# TMPDIR = /tmp

View file

@ -32,7 +32,7 @@ Rzip Chunk Data:
XX Stream 0 header data
XX Stream 1 header data
Stream Header Data:
Stream Header Data:
Byte:
0 Compressed data type
(RCD0 bytes) Compressed data length

View file

@ -31,7 +31,7 @@
#endif
/* needed for CRC routines */
#include "lzma/C/7zCrc.h"
#include "7zCrc.h"
#include "util.h"
#include "lrzip_core.h"
#include "rzip.h"

View file

@ -271,6 +271,16 @@ int main(int argc, char *argv[])
}
/* LZMA is the default */
if (!lrzip_mode_get(lr)) lrzip_mode_set(lr, LRZIP_MODE_COMPRESS_LZMA);
/* Because LZMA default compression level is 5, not 7, compression_level is
* initialized as 0, not 7. A check must be made to set it if not otherwise
* specified on command line */
if (!lrzip_compression_level_get(lr))
if (LRZIP_MODE_COMPRESS_LZMA)
lrzip_compression_level_set(lr, 5); // default LZMA level is 5
else
lrzip_compression_level_set(lr, 7);
argc -= optind, argv += optind;
if (lrzip_outfilename_get(lr) && (argc > 1))

42
lrzip.c
View file

@ -55,6 +55,8 @@
#include "util.h"
#include "stream.h"
#include "LzmaDec.h" // decode LZMA header for get_info
#define MAGIC_LEN (24)
static void release_hashes(rzip_control *control);
@ -137,8 +139,8 @@ i64 nloops(i64 seconds, uchar *b1, uchar *b2)
bool write_magic(rzip_control *control)
{
char magic[MAGIC_LEN] = {
'L', 'R', 'Z', 'I', LRZIP_MAJOR_VERSION, LRZIP_MINOR_VERSION
char magic[MAGIC_LEN] = {
'L', 'R', 'Z', 'I', LRZIP_MAJOR_VERSION, LRZIP_MINOR_VERSION
};
/* File size is stored as zero for streaming STDOUT blocks when the
@ -807,17 +809,21 @@ bool decompress_file(rzip_control *control)
fatal_return(("Invalid expected size %lld\n", expected_size), false);
}
if (!STDOUT && !TEST_ONLY) {
if (!STDOUT) {
/* Check if there's enough free space on the device chosen to fit the
* decompressed file. */
* decompressed or test file. */
if (unlikely(fstatvfs(fd_out, &fbuf)))
fatal_return(("Failed to fstatvfs in decompress_file\n"), false);
free_space = (i64)fbuf.f_bsize * (i64)fbuf.f_bavail;
if (free_space < expected_size) {
if (FORCE_REPLACE)
print_err("Warning, inadequate free space detected, but attempting to decompress due to -f option being used.\n");
if (FORCE_REPLACE && !TEST_ONLY)
print_err("Warning, inadequate free space detected, but attempting to decompress file due to -f option being used.\n");
else
failure_return(("Inadequate free space to decompress file, use -f to override.\n"), false);
failure_return(("Inadequate free space to %s. Space needed: %ld. Space available: %ld.\nTry %s and \
select a larger volume.\n",
TEST_ONLY ? "test file" : "decompress file. Use -f to override", expected_size, free_space,
TEST_ONLY ? "setting `TMP=dirname`" : "using `-O dirname` or `-o [dirname/]filename` options"),
false);
}
}
control->fd_out = fd_out;
@ -953,6 +959,8 @@ bool get_fileinfo(rzip_control *control)
uchar save_ctype = 255;
struct stat st;
int fd_in;
CLzmaProps p; // decode lzma header
int lzma_ret;
if (!STDIN) {
struct stat fdin_stat;
@ -1145,8 +1153,13 @@ done:
print_output("rzip + bzip2\n");
else if (save_ctype == CTYPE_LZO)
print_output("rzip + lzo\n");
else if (save_ctype == CTYPE_LZMA)
print_output("rzip + lzma\n");
else if (save_ctype == CTYPE_LZMA) {
print_output("rzip + lzma -- ");
if (lzma_ret=LzmaProps_Decode(&p, control->lzma_properties, sizeof(control->lzma_properties))==SZ_OK)
print_output("lc = %d, lp = %d, pb = %d, Dictionary Size = %d\n", p.lc, p.lp, p.pb, p.dicSize);
else
print_err("Corrupt LZMA Properties\n");
}
else if (save_ctype == CTYPE_GZIP)
print_output("rzip + gzip\n");
else if (save_ctype == CTYPE_ZPAQ)
@ -1214,7 +1227,7 @@ bool compress_file(rzip_control *control)
fd_in = open(control->infile, O_RDONLY);
if (unlikely(fd_in == -1))
fatal_return(("Failed to open %s\n", control->infile), false);
}
}
else
fd_in = 0;
@ -1325,7 +1338,7 @@ error:
bool initialise_control(rzip_control *control)
{
time_t now_t, tdiff;
char localeptr[] = "./", *eptr; /* for environment */
char localeptr[] = "./", *eptr; /* for environment */
size_t len;
memset(control, 0, sizeof(rzip_control));
@ -1334,12 +1347,13 @@ bool initialise_control(rzip_control *control)
register_outputfile(control, control->msgout);
control->flags = FLAG_SHOW_PROGRESS | FLAG_KEEP_FILES | FLAG_THRESHOLD;
control->suffix = ".lrz";
control->compression_level = 7;
control->compression_level = 0; /* 0 because lrzip default level is 5, others 7 */
control->dictSize = 0; /* Dictionary Size for lzma. 0 means program decides */
control->ramsize = get_ram(control);
if (unlikely(control->ramsize == -1))
return false;
/* for testing single CPU */
control->threads = PROCESSORS; /* get CPUs for LZMA */
control->threads = PROCESSORS; /* get CPUs for LZMA */
control->page_size = PAGE_SIZE;
control->nice_val = 19;
@ -1378,7 +1392,7 @@ bool initialise_control(rzip_control *control)
fatal_return(("Failed to allocate for tmpdir\n"), false);
strcpy(control->tmpdir, eptr);
if (control->tmpdir[len - 1] != '/') {
control->tmpdir[len] = '/'; /* need a trailing slash */
control->tmpdir[len] = '/'; /* need a trailing slash */
control->tmpdir[len + 1] = '\0';
}
return true;

View file

@ -401,10 +401,11 @@ struct rzip_control {
FILE *msgerr; //stream for output errors
char *suffix;
uchar compression_level;
i64 overhead; // compressor overhead
i64 usable_ram; // the most ram we'll try to use on one activity
i64 maxram; // the largest chunk of ram to allocate
i64 overhead; // compressor overhead
i64 usable_ram; // the most ram we'll try to use on one activity
i64 maxram; // the largest chunk of ram to allocate
unsigned char lzma_properties[5]; // lzma properties, encoded
unsigned dictSize; // lzma Dictionary size - set in overhead computation
i64 window;
unsigned long flags;
i64 ramsize;
@ -438,7 +439,7 @@ struct rzip_control {
cksem_t cksumsem;
md5_ctx ctx;
uchar md5_resblock[MD5_DIGEST_SIZE];
i64 md5_read; // How far into the file the md5 has done so far
i64 md5_read; // How far into the file the md5 has done so far
struct checksum checksum;
const char *util_infile;

38
main.c
View file

@ -59,7 +59,7 @@
#include "stream.h"
/* needed for CRC routines */
#include "lzma/C/7zCrc.h"
#include "7zCrc.h"
#define MAX_PATH_LEN 4096
@ -117,6 +117,7 @@ static void usage(bool compat)
}
if (!compat)
print_output(" -L, --level level set lzma/bzip2/gzip compression level (1-9, default 7)\n");
print_output(" --dictsize Set Dictionary Size for LZMA ds=12 to 30 expressed as 2^ds\n");
print_output(" -N, --nice-level value Set nice value to value (default %d)\n", compat ? 0 : 19);
print_output(" -p, --threads value Set processor count to override number of threads\n");
print_output(" -m, --maxram size Set maximum available ram in hundreds of MB\n");
@ -160,13 +161,15 @@ static void show_summary(void)
{
/* OK, if verbosity set, print summary of options selected */
if (!INFO) {
if (!TEST_ONLY)
print_verbose("The following options are in effect for this %s.\n",
DECOMPRESS ? "DECOMPRESSION" : "COMPRESSION");
print_verbose("The following options are in effect for this %s.\n",
DECOMPRESS ? "DECOMPRESSION" : TEST_ONLY ? "INTEGRITY TEST" : "COMPRESSION");
print_verbose("Threading is %s. Number of CPUs detected: %d\n", control->threads > 1? "ENABLED" : "DISABLED",
control->threads);
print_verbose("Detected %lld bytes ram\n", control->ramsize);
print_verbose("Compression level %d\n", control->compression_level);
print_verbose("Compression level %d %s\n", control->compression_level,
LZMA_COMPRESS ? (control->compression_level == 5 ? "- Default LZMA level": "" ) : "" );
if (LZMA_COMPRESS)
print_verbose("Dictionary Size: %ld\n", control->dictSize );
print_verbose("Nice Value: %d\n", control->nice_val);
print_verbose("Show Progress\n");
print_maxverbose("Max ");
@ -259,6 +262,7 @@ static struct option long_options[] = {
{"zpaq", no_argument, 0, 'z'},
{"fast", no_argument, 0, '1'},
{"best", no_argument, 0, '9'},
{"dictsize", required_argument, 0, '\\'},
{0, 0, 0, 0},
};
@ -312,7 +316,7 @@ int main(int argc, char *argv[])
struct timeval start_time, end_time;
struct sigaction handler;
double seconds,total_time; // for timers
int c, i;
int c, i, ds;
int hours,minutes;
extern int optind;
char *eptr, *av; /* for environment */
@ -369,7 +373,7 @@ int main(int argc, char *argv[])
if ((control->flags & FLAG_NOT_LZMA) && conf_file_compression_set == false)
failure("Can only use one of -l, -b, -g, -z or -n\n");
/* Select Compression Mode */
control->flags &= ~FLAG_NOT_LZMA; /* must clear all compressions first */
control->flags &= ~FLAG_NOT_LZMA; /* must clear all compressions first */
if (c == 'b')
control->flags |= FLAG_BZIP2_COMPRESS;
else if (c == 'g')
@ -383,8 +387,8 @@ int main(int argc, char *argv[])
/* now FLAG_NOT_LZMA will evaluate as true */
conf_file_compression_set = false;
break;
case '/': /* LZMA Compress selected */
control->flags &= ~FLAG_NOT_LZMA; /* clear alternate compression flags */
case '/': /* LZMA Compress selected */
control->flags &= ~FLAG_NOT_LZMA; /* clear alternate compression flags */
break;
case 'c':
if (compat) {
@ -403,6 +407,13 @@ int main(int argc, char *argv[])
case 'D':
control->flags &= ~FLAG_KEEP_FILES;
break;
case '\\':
/* Dictionary Size, 2^12-30 */
control->dictSize = atoi(optarg);
if (control->dictSize < 12 || control->dictSize > 30)
failure("Dictionary Size must be between 12 and 30 for 2^12 (4KB) to 2^30 (1GB)");
control->dictSize = (1 << control->dictSize);
break;
case 'e':
control->flags |= FLAG_ENCRYPT;
control->passphrase = optarg;
@ -537,6 +548,15 @@ int main(int argc, char *argv[])
}
}
/* Because LZMA default compression level is 5, not 7, compression_level is
* initialized as 0, not 7. A check must be made to set it if not otherwise
* specified on command line */
if (!control->compression_level)
if (LZMA_COMPRESS)
control->compression_level = 5; // default LZMA level is 5
else
control->compression_level = 7;
argc -= optind;
argv += optind;

View file

@ -1,4 +1,4 @@
.TH "lrzip" "1" "June 2016" "" ""
.TH "lrzip" "1" "December 2019" "" ""
.SH "NAME"
lrzip \- a large-file compression program
.SH "SYNOPSIS"
@ -55,13 +55,15 @@ Options affecting output:
\-O, \-\-outdir directory specify the output directory when -o is not used
\-S, \-\-suffix suffix specify compressed suffix (default '.lrz')
Options affecting compression:
\-\-lzma lzma compression (default)
\-b, \-\-bzip2 bzip2 compression
\-g, \-\-gzip gzip compression using zlib
\-l, \-\-lzo lzo compression (ultra fast)
\-n, \-\-no-compress no backend compression - prepare for other compressor
\-z, \-\-zpaq zpaq compression (best, extreme compression, extremely slow)
Low level options:
\-L, \-\-level level set lzma/bzip2/gzip compression level (1-9, default 7)
\-L, \-\-level level Set lzma/bzip2/gzip compression level (1-9, default 7)
\-\-dictsize = ds Set Dictionary Size for LZMA ds=12 to 30 expressed as 2^ds
\-N, \-\-nice-level value Set nice value to value (default 19)
\-p, \-\-threads value Set processor count to override number of threads
\-m, \-\-maxram size Set maximum available ram in hundreds of MB
@ -227,6 +229,11 @@ Set the compression level from 1 to 9. The default is to use level 7, which
gives good all round compression. The compression level is also strongly related
to how much memory lrzip uses. See the \-w option for details.
.IP
.IP "\fB--dictsize=12\&.\&.30\fP"
Set Dictionary Size for LZMA from 2^12 (16KB) to 2^30 (1GB). Normally this
option is not useful since lrzip will set and sometimes change the dictionary
size depending on the compression level selected and usable ram available.
.IP
.IP "\fB-N value\fP"
The default nice value is 19. This option can be used to set the priority
scheduling for the lrzip backup or decompression. Valid nice values are

View file

@ -21,55 +21,62 @@ Parameter values are not case sensitive except where specified\&.
.PP
.SH "CONFIG FILE EXAMPLE"
.nf
# This is a comment.
# lrzip.conf example file
# Compression Window size in 100MB. Normally selected by program. (-w)
# WINDOW = 20
# \fBWINDOW = 20\fP
# Compression Level 1-9 (7 Default). (-L)
# COMPRESSIONLEVEL = 7
# \fBCOMPRESSIONLEVEL = 7\fP
# Use -U setting, Unlimited ram. Yes or No
# UNLIMITED = NO
# \fBUNLIMITED = NO\fP
# Compression Method, rzip, gzip, bzip2, lzo, or lzma (default), or zpaq. (-n -g -b -l --lzma -z)
# If specified here, command line options not usable.
# COMPRESSIONMETHOD = lzma
# May be overriden by command line compression choice.
# \fBCOMPRESSIONMETHOD = lzma\fP
# LZMA Dictionary Size. 0 = default value used. 12-30 = 2^ds will be used.
# \fBDICTIONARYSIZE = 0\fP
# Perform LZO Test. Default = YES (-T )
# LZOTEST = NO
# \fBLZOTEST = NO\fP
# Hash Check on decompression, (-c)
# HASHCHECK = YES
# \fBHASHCHECK = YES\fP
# Show HASH value on Compression even if Verbose is off, YES (-H)
# SHOWHASH = YES
# \fBSHOWHASH = YES\fP
# Default output directory (-O)
# OUTPUTDIRECTORY = location
# \fBOUTPUTDIRECTORY = location\fP
# Verbosity, YES or MAX (v, vv)
# VERBOSITY = max
# \fBVERBOSITY = max\fP
# Show Progress as file is parsed, YES or no (NO = -q option)
# SHOWPROGRESS = YES
# \fBSHOWPROGRESS = YES\fP
# Set Niceness. 19 is default. -20 to 19 is the allowable range (-N)
# NICE = 19
# \fBNICE = 19\fP
# Keep broken or damaged output files, YES (-K)
# KEEPBROKEN = YES
# \fBKEEPBROKEN = YES\fP
# Delete source file after compression (-D)
# this parameter and value are case sensitive
# value must be YES to activate
# DELETEFILES = NO
# \fBDELETEFILES = NO\fP
# Replace existing lrzip file when compressing (-f)
# this parameter and value are case sensitive
# value must be YES to activate
# REPLACEFILE = YES
# \fBREPLACEFILE = YES\fP
# Override for Temporary Directory. Only valid when stdin/out or Test is used
# TMPDIR = /tmp
# \fBTMPDIR = /tmp\fP
# Whether to use encryption on compression YES, NO (-e)
# ENCRYPT = NO
# \fBENCRYPT = NO\fP
.fi
.PP
.SH "NOTES"

View file

@ -46,7 +46,7 @@
#include "util.h"
#include "lrzip_core.h"
/* needed for CRC routines */
#include "lzma/C/7zCrc.h"
#include "7zCrc.h"
static inline uchar read_u8(rzip_control *control, void *ss, int stream, bool *err)
{
@ -253,12 +253,14 @@ static i64 runzip_chunk(rzip_control *control, int fd_in, i64 expected_size, i64
{
uint32 good_cksum, cksum = 0;
i64 len, ofs, total = 0;
int l = -1, p = 0;
int p;
char chunk_bytes;
struct stat st;
uchar head;
void *ss;
bool err = false;
struct timeval curtime, lasttime;
lasttime.tv_sec = 0;
/* for display of progress */
unsigned long divisor[] = {1,1024,1048576,1073741824U};
@ -338,11 +340,12 @@ static i64 runzip_chunk(rzip_control *control, int fd_in, i64 expected_size, i64
}
if (expected_size) {
p = 100 * ((double)(tally + total) / (double)expected_size);
if (p / 10 != l / 10) {
gettimeofday(&curtime,NULL);
if (curtime.tv_sec - lasttime.tv_sec > 5 || p == 100 ) { /* update every 5 seconds or when done */
prog_done = (double)(tally + total) / (double)divisor[divisor_index];
print_progress("%3d%% %9.2f / %9.2f %s\r",
p, prog_done, prog_tsize, suffix[divisor_index] );
l = p;
lasttime.tv_sec = curtime.tv_sec;
}
}
}

4
rzip.c
View file

@ -57,7 +57,7 @@
#include "util.h"
#include "lrzip_core.h"
/* needed for CRC routines */
#include "lzma/C/7zCrc.h"
#include "7zCrc.h"
#ifndef MAP_ANONYMOUS
# define MAP_ANONYMOUS MAP_ANON
@ -1124,7 +1124,7 @@ retry:
gettimeofday(&current, NULL);
/* this will count only when size > window */
if (last.tv_sec > 0 && pct_base > 100) {
if (last.tv_sec > 0 && pct_base > 0) {
unsigned int eta_hours, eta_minutes, eta_seconds, elapsed_time, finish_time,
elapsed_hours, elapsed_minutes, elapsed_seconds, diff_seconds;

View file

@ -55,7 +55,7 @@
#endif
/* LZMA C Wrapper */
#include "lzma/C/LzmaLib.h"
#include "LzmaLib.h"
#include "util.h"
#include "lrzip_core.h"
@ -303,30 +303,22 @@ static int lzma_compress_buf(rzip_control *control, struct compress_thread *cthr
if (!lzo_compresses(control, cthread->s_buf, cthread->s_len))
return 0;
/* only 7 levels with lzma, scale them */
lzma_level = control->compression_level * 7 / 9;
if (!lzma_level)
lzma_level = 1;
print_maxverbose("Starting lzma back end compression thread...\n");
retry:
dlen = round_up_page(control, cthread->s_len);
dlen = round_up_page(control, cthread->s_len * 1.02); // add 2% for lzma overhead to prevent memory overrun
c_buf = malloc(dlen);
if (!c_buf) {
print_err("Unable to allocate c_buf in lzma_compress_buf\n");
return -1;
}
/* with LZMA SDK 4.63, we pass compression level and threads only
* and receive properties in lzma_properties */
/* pass absolute dictionary size and compression level */
lzma_ret = LzmaCompress(c_buf, &dlen, cthread->s_buf,
(size_t)cthread->s_len, lzma_properties, &prop_size,
lzma_level,
0, /* dict size. set default, choose by level */
control->compression_level,
control->dictSize, // absolute dictionary size
-1, -1, -1, -1, /* lc, lp, pb, fb */
control->threads > 1 ? 2: 1);
/* LZMA spec has threads = 1 or 2 only. */
-1); // threads. let lzma encoder decide if multi threading is used, not lrzip
if (lzma_ret != SZ_OK) {
switch (lzma_ret) {
case SZ_ERROR_MEM:
@ -966,20 +958,42 @@ void *open_stream_out(rzip_control *control, int f, unsigned int n, i64 chunk_li
else
testbufs = 2;
testsize = (limit * testbufs) + (control->overhead * control->threads);
if (testsize > control->usable_ram)
limit = (control->usable_ram - (control->overhead * control->threads)) / testbufs;
/* If we don't have enough ram for the number of threads, decrease the
* number of threads till we do, or only have one thread. */
while (limit < STREAM_BUFSIZE && limit < chunk_limit) {
if (control->threads > 1)
--control->threads;
else
break;
limit = (control->usable_ram - (control->overhead * control->threads)) / testbufs;
limit = MIN(limit, chunk_limit);
/* Reduce threads one by one and then reduce dictionary size
* by 10% until testsize is within range of estimated allocated
* memory for backend compression. Leave `limit` alone for now.
* First reduce threads, then dictionary size. */
int dict_or_threads = 0;
unsigned DICTSIZEDEFAULT = (1<<24); // a reasonable minimum dictionary size
unsigned save_dictSize = control->dictSize; // save dictionary
while(1) {
if ((testsize = (limit * testbufs) + (control->overhead * control->threads)) > control->usable_ram) {
if (dict_or_threads < 2) { // reduce thread count
control->threads--;
if (control->threads == 0) {
control->threads = 1; // threads are 1, break
break;
}
dict_or_threads++;
}
else { // reduce computed dictionary size
control->dictSize *= 0.90; // by 10% each iteration
round_to_page(&control->dictSize); // round to a page size
if (control->dictSize % 2) // if dictionary size is an odd number
control->dictSize -= 1; // round down to even number
setup_overhead(control); // recompute overhead
dict_or_threads = 0;
}
print_maxverbose("Reducing Dictionary Size to %ld and/or Threads to %d to maximize threads and compression block size.\n",
control->dictSize, control->threads);
if (control->dictSize <= DICTSIZEDEFAULT) // break on minimum
break;
continue; // test again
}
break;
}
if (control->dictSize != save_dictSize)
print_verbose("Dictionary Size reduced to %d\n", control->dictSize);
if (BITS32) {
limit = MIN(limit, one_g);
if (limit + (control->overhead * control->threads) > one_g)

78
util.c
View file

@ -108,18 +108,33 @@ void fatal_exit(rzip_control *control)
void setup_overhead(rzip_control *control)
{
/* Work out the compression overhead per compression thread for the
* compression back-ends that need a lot of ram */
* compression back-ends that need a lot of ram
* and set Dictionary size */
if (LZMA_COMPRESS) {
int level = control->compression_level * 7 / 9;
if (!level)
level = 1;
i64 dictsize = (level <= 5 ? (1 << (level * 2 + 14)) :
(level == 6 ? (1 << 25) : (1 << 26)));
control->overhead = (dictsize * 23 / 2) + (6 * 1024 * 1024) + 16384;
if (control->dictSize == 0)
switch (control->compression_level) {
case 1:
case 2:
case 3:
case 4:
case 5: control->dictSize = (1 << (control->compression_level * 2 + 14));
break; // 65KB to 16MB
case 6:
case 7: control->dictSize = (1 << 25);
break; // 32MB
case 8: control->dictSize = (1 << 26);
break; // 64MB
case 9: control->dictSize = (1 << 27);
break; // 128MB -- this is maximum for 32 bits
default: control->dictSize = (1 << 24);
break; // 16MB -- should never reach here
}
/* LZMA spec shows memory requirements as 6MB, not 4MB and state size
* where default is 16KB */
// FIXME, need to check filesize and make sure dictionary is not too large
// or larger than maxram also. May also need test for 32 bit or 64 bit
control->overhead = (control->dictSize * 23 / 2) + (6 * 1024 * 1024) + 16384;
} else if (ZPAQ_COMPRESS)
control->overhead = 112 * 1024 * 1024;
}
@ -231,14 +246,15 @@ bool read_config(rzip_control *control)
if (isparameter(parameter, "window"))
control->window = atoi(parametervalue);
else if (isparameter(parameter, "unlimited")) {
else if (isparameter(parameter, "unlimited"))
if (isparameter(parametervalue, "yes"))
control->flags |= FLAG_UNLIMITED;
} else if (isparameter(parameter, "compressionlevel")) {
else if (isparameter(parameter, "compressionlevel")) {
control->compression_level = atoi(parametervalue);
if ( control->compression_level < 1 || control->compression_level > 9 )
failure_return(("CONF.FILE error. Compression Level must between 1 and 9"), false);
} else if (isparameter(parameter, "compressionmethod")) {
}
else if (isparameter(parameter, "compressionmethod")) {
/* valid are rzip, gzip, bzip2, lzo, lzma (default), and zpaq */
if (control->flags & FLAG_NOT_LZMA)
failure_return(("CONF.FILE error. Can only specify one compression method"), false);
@ -254,26 +270,28 @@ bool read_config(rzip_control *control)
control->flags |= FLAG_ZPAQ_COMPRESS;
else if (!isparameter(parametervalue, "lzma")) /* oops, not lzma! */
failure_return(("CONF.FILE error. Invalid compression method %s specified\n",parametervalue), false);
} else if (isparameter(parameter, "lzotest")) {
}
else if (isparameter(parameter, "lzotest"))
/* default is yes */
if (isparameter(parametervalue, "no"))
control->flags &= ~FLAG_THRESHOLD;
} else if (isparameter(parameter, "hashcheck")) {
else if (isparameter(parameter, "hashcheck"))
if (isparameter(parametervalue, "yes")) {
control->flags |= FLAG_CHECK;
control->flags |= FLAG_HASH;
}
} else if (isparameter(parameter, "showhash")) {
else if (isparameter(parameter, "showhash"))
if (isparameter(parametervalue, "yes"))
control->flags |= FLAG_HASH;
} else if (isparameter(parameter, "outputdirectory")) {
else if (isparameter(parameter, "outputdirectory")) {
control->outdir = malloc(strlen(parametervalue) + 2);
if (!control->outdir)
fatal_return(("Fatal Memory Error in read_config"), false);
strcpy(control->outdir, parametervalue);
if (strcmp(parametervalue + strlen(parametervalue) - 1, "/"))
strcat(control->outdir, "/");
} else if (isparameter(parameter,"verbosity")) {
}
else if (isparameter(parameter,"verbosity")) {
if (control->flags & FLAG_VERBOSE)
failure_return(("CONF.FILE error. Verbosity already defined."), false);
if (isparameter(parametervalue, "yes"))
@ -282,39 +300,47 @@ bool read_config(rzip_control *control)
control->flags |= FLAG_VERBOSITY_MAX;
else /* oops, unrecognized value */
print_err("lrzip.conf: Unrecognized verbosity value %s. Ignored.\n", parametervalue);
} else if (isparameter(parameter, "showprogress")) {
}
else if (isparameter(parameter, "showprogress"))
/* Yes by default */
if (isparameter(parametervalue, "NO"))
control->flags &= ~FLAG_SHOW_PROGRESS;
} else if (isparameter(parameter,"nice")) {
else if (isparameter(parameter,"nice")) {
control->nice_val = atoi(parametervalue);
if (control->nice_val < -20 || control->nice_val > 19)
failure_return(("CONF.FILE error. Nice must be between -20 and 19"), false);
} else if (isparameter(parameter, "keepbroken")) {
}
else if (isparameter(parameter, "keepbroken"))
if (isparameter(parametervalue, "yes" ))
control->flags |= FLAG_KEEP_BROKEN;
} else if (iscaseparameter(parameter, "DELETEFILES")) {
else if (iscaseparameter(parameter, "DELETEFILES"))
/* delete files must be case sensitive */
if (iscaseparameter(parametervalue, "YES"))
control->flags &= ~FLAG_KEEP_FILES;
} else if (iscaseparameter(parameter, "REPLACEFILE")) {
else if (iscaseparameter(parameter, "REPLACEFILE"))
/* replace lrzip file must be case sensitive */
if (iscaseparameter(parametervalue, "YES"))
control->flags |= FLAG_FORCE_REPLACE;
} else if (isparameter(parameter, "tmpdir")) {
else if (isparameter(parameter, "tmpdir")) {
control->tmpdir = realloc(NULL, strlen(parametervalue) + 2);
if (!control->tmpdir)
fatal_return(("Fatal Memory Error in read_config"), false);
strcpy(control->tmpdir, parametervalue);
if (strcmp(parametervalue + strlen(parametervalue) - 1, "/"))
strcat(control->tmpdir, "/");
} else if (isparameter(parameter, "encrypt")) {
}
else if (isparameter(parameter, "encrypt"))
if (isparameter(parameter, "YES"))
control->flags |= FLAG_ENCRYPT;
} else
else if (isparameter(parameter, "dictionarysize")) {
control->dictSize = atoi(parametervalue);
if (control->dictSize != 0 && (control->dictSize < 12 || control->dictSize > 30))
failure_return(("CONF FILE error. Dictionary Size must be between 12 and 30."), false);
}
else
/* oops, we have an invalid parameter, display */
print_err("lrzip.conf: Unrecognized parameter value, %s = %s. Continuing.\n",\
parameter, parametervalue);
parameter, parametervalue);
}
if (unlikely(fclose(fp)))