This is done by making the semaphore wrappers null functions on osx and closing the thread in the creation wrapper.
Move the wrappers to rzip.h to make this change clean.
Instead of failing completely, detect when a failure has occurred, and serialise work for that thread to decrease the memory required to complete compression / decompression.
Do that by waiting for the thread before it to complete before trying to work on that block again.
Check internally when lzma compress has failed and try a lower compression level before falling back to bzip2 compression.
This overlapping of compressing streams means that when files are large enough to be split into multiple blocks, all CPUs will be used more effectively throughout the compression, affording a nice speedup.
Move the writing of the chunk byte size and initial headers into the compthread to prevent any races occurring.
Fix a few dodgy callocs that may have been overflowing!
The previous commit reverts were done because the changes designed to speed it up actually slowed it down instead.
Scale the 9 levels down to the 7 that lzma has.
This makes the default lzma compression level 5 which is what lzma normally has, and uses a lot less ram and is significantly faster than previously, but at the cost of giving slightly less compression.
Change suggested maximum compression in README to disable threading with -p 1.
Use bzip2 as a fallback compression when lzma fails due to internal memory errors as may happen on 32 bits.
Limit lzma windows to 300MB in the right place on 32 bit only.
Make the main process less nice than the backend threads since it tends to be the rate limiting step.
Choose sane defaults for memory usage since linux ludicriously overcommits.
Use sliding mmap for any compression windows greater than 2/3 ram.
Consolidate and simplify testing of allocatable ram.
Minor tweaks to output.
Round up the size of the high buffer in sliding mmap to one page.
Squeeze a little more out of 32 bit compression windows.
This is done by taking each stream of data on read in into separate buffers for up to as many threads as CPUs.
As each thread's data becomes available, feed it into runzip once it is requests more of the stream.
Provided there are enough chunks in the originally compressed data, this provides a massive speedup potentially proportional to the number of CPUs. The slower the backend compression, the better the speed up (i.e. zpaq is the best sped up).
Fix the output of zpaq compress and decompress from trampling on itself and racing and consuming a lot of CPU time printing to the console.
When limiting cwindow to 6 on 32 bits, ensure that control.window is also set.
When testing for the maximum size of testmalloc, the multiple used was out by one, so increase it.
Minor output tweaks.
Place the data from each stream into a buffer that then is handed over to one thread which is allowed to begin doing the backend compression while the main rzip stream continues operating.
Fork up to as many threads as CPUs and feed data to them in a ring fashion, parallelising the workload as much as possible.
This causes a big speed up on the compression side on SMP machines.
Thread compression is limited to a minimum of 10MB compressed per thread to minimise the compromise to compression of smaller windows.
Alter the progress output to match some of the changes in verbose modes.
This facilitates a massive speed up on SMP machines proportional to the number of CPUs during the back end compression phase, but does so at some cost to the final size.
Limit the number of threads to ensure that each thread at least works on a window of STREAM_BUFSIZE.
Disable the lzma threading library as it does not contribute any more to the scalability of this new approach, yet compromises compression.
Increase the size of the windows passed to all the compression back end types now as they need more to split them up into multiple threads, and the number of blocks increases the compressed size slightly.
Prevent failure when offset is not a multiple of page size.
Add chunk percentage complete to output.
Tweak output at various verbosities.
Update documentation to reflect improved performance of unlimited mode.
Update benchmark results.
More tidying.
Modify the sliding mmap window to have a 64k smaller buffer which matches the size of the search size, and change the larger lower buffer to make it slide with the main hash search progress. This makes for a MUCH faster unlimited mode, making it actually usable.
Limit windows to 2GB again on 32 bit, but do it when determining the largest size possible in rzip.c.
Implement a linux-kernel like unlikely() wrapper for inbuilt expect, and modify most fatal warnings to be unlikely, and a few places where it's also suitable.
Minor cleanups.