It was only speeding up compression a small amount, yet adversely affected compression and would segfault due to the size not being consistent on successive passes.
As we can work out what that compression overhead is, we can factor that into testing how much ram we can allocate.
There is no advantage to running multiple threads when there is no compression back end so drop to 1 only.
Limit ram for compression back end to 1/3 ram regardless for when OSs lie due to heavy overcommit.
Add one more thread on compression and decompression to account for the staggered nature of thread recruitment.
Make the initial buffer slightly smaller and make it progressively larger, thus recruiting threads sooner and more evenly.
This also speeds up decompression for the same reason.
Check the amount of memory being used by each thread on decompression to ensure we don't try to recruit too much ram.
Update main help screen to include environment settings.
Update to respect $TMP environment variable for TMP files.
Updated control structure to include tmpdir pointer.
Update lrzip.conf parser to respect -U -M options.
Update lrzip.conf example to include new parameters.
Reorder main Switch loop in main.c for readability.
Have MAXRAM and control.window be exclusive. MAXRAM wins.
Have UNLIMITED and control.window be exclusive. UNLIMITED wins.
Have UNLIMITED and MAXRAM be exclusive. UNLIMITED wins.
Corrects heuristic computation in rzip.c which would override
MAXRAM or UNLIMITED if control.window set
Show heuristically computed control.window when computed.
Remove display compression level from control.window verbose output.
Update print_verbose format for Testing for incompressible data in stream.c
to omit extra \n.
Changes by Peter Hyman <pete@peterhyman.com>
Add the md5 value to the end of each archive.
This can then be used for integrity testing instead of crc32.
Keep crc in new archives to maintain compatibility with version 0.5 files.
Use md5 integrity testing on decompression when available in preference, and disable calculation of crc32.
Display the choice of integrity testing in verbose output and when -i is used.
Display the md5 and crc values when max verbosity, file info, or display hash is enabled.
Store a new flag in the magic header to show that the md5 value is stored at the end of the file.
Update the magic header information document.
Implement md5 hash checking on compression by doing the md5 hash check as each sb low buffer has been allocated to avoid going over the file again where possible.
This is done by making the semaphore wrappers null functions on osx and closing the thread in the creation wrapper.
Move the wrappers to rzip.h to make this change clean.
This overlapping of compressing streams means that when files are large enough to be split into multiple blocks, all CPUs will be used more effectively throughout the compression, affording a nice speedup.
Move the writing of the chunk byte size and initial headers into the compthread to prevent any races occurring.
Fix a few dodgy callocs that may have been overflowing!
The previous commit reverts were done because the changes designed to speed it up actually slowed it down instead.
Change suggested maximum compression in README to disable threading with -p 1.
Use bzip2 as a fallback compression when lzma fails due to internal memory errors as may happen on 32 bits.
Limit lzma windows to 300MB in the right place on 32 bit only.
Make the main process less nice than the backend threads since it tends to be the rate limiting step.
Choose sane defaults for memory usage since linux ludicriously overcommits.
Use sliding mmap for any compression windows greater than 2/3 ram.
Consolidate and simplify testing of allocatable ram.
Minor tweaks to output.
Round up the size of the high buffer in sliding mmap to one page.
Squeeze a little more out of 32 bit compression windows.
This is done by taking each stream of data on read in into separate buffers for up to as many threads as CPUs.
As each thread's data becomes available, feed it into runzip once it is requests more of the stream.
Provided there are enough chunks in the originally compressed data, this provides a massive speedup potentially proportional to the number of CPUs. The slower the backend compression, the better the speed up (i.e. zpaq is the best sped up).
Fix the output of zpaq compress and decompress from trampling on itself and racing and consuming a lot of CPU time printing to the console.
When limiting cwindow to 6 on 32 bits, ensure that control.window is also set.
When testing for the maximum size of testmalloc, the multiple used was out by one, so increase it.
Minor output tweaks.