Implement md5 hash checking on compression by doing the md5 hash check as each sb low buffer has been allocated to avoid going over the file again where possible.
This overlapping of compressing streams means that when files are large enough to be split into multiple blocks, all CPUs will be used more effectively throughout the compression, affording a nice speedup.
Move the writing of the chunk byte size and initial headers into the compthread to prevent any races occurring.
Fix a few dodgy callocs that may have been overflowing!
The previous commit reverts were done because the changes designed to speed it up actually slowed it down instead.
Choose sane defaults for memory usage since linux ludicriously overcommits.
Use sliding mmap for any compression windows greater than 2/3 ram.
Consolidate and simplify testing of allocatable ram.
Minor tweaks to output.
Round up the size of the high buffer in sliding mmap to one page.
Squeeze a little more out of 32 bit compression windows.
Remove -P option as failing to set permissions only issues a warning now, removing any requirement for -P.
Change default compression level back to 7 as 9 was not giving significantly better compression but was slowing things down.
Place the data from each stream into a buffer that then is handed over to one thread which is allowed to begin doing the backend compression while the main rzip stream continues operating.
Fork up to as many threads as CPUs and feed data to them in a ring fashion, parallelising the workload as much as possible.
This causes a big speed up on the compression side on SMP machines.
Thread compression is limited to a minimum of 10MB compressed per thread to minimise the compromise to compression of smaller windows.
Alter the progress output to match some of the changes in verbose modes.
Prevent failure when offset is not a multiple of page size.
Add chunk percentage complete to output.
Tweak output at various verbosities.
Update documentation to reflect improved performance of unlimited mode.
Update benchmark results.
More tidying.
Modify the sliding mmap window to have a 64k smaller buffer which matches the size of the search size, and change the larger lower buffer to make it slide with the main hash search progress. This makes for a MUCH faster unlimited mode, making it actually usable.
Limit windows to 2GB again on 32 bit, but do it when determining the largest size possible in rzip.c.
Implement a linux-kernel like unlikely() wrapper for inbuilt expect, and modify most fatal warnings to be unlikely, and a few places where it's also suitable.
Minor cleanups.
accessing the buffer directly, thus allowing us to have window sizes larger than
available ram. This is implemented through the use of a "sliding mmap"
implementation. Sliding mmap uses two mmapped buffers, one large one as
previously, and one page sized smaller one. When an attempt is made to read
beyond the end of the large buffer, the small buffer is remapped to the file
area that's being accessed. While this implementation is 100x slower than direct
mmapping, it allows us to implement unlimited sized compression windows.
Implement the -U option with unlimited sized windows.
Rework the selection of compression windows. Instead of trying to guess how
much ram the machine might be able to access, we try to safely buffer as much
ram as we can, and then use that to determine the file buffer size. Do not
choose an arbitrary upper window limit unless -w is specified.
Rework the -M option to try to buffer the entire file, reducing the buffer
size until we succeed.
Align buffer sizes to page size.
Clean up lots of unneeded variables.
Fix lots of minor logic issues to do with window sizes accepted/passed to rzip
and the compression backends.
More error handling.
Change -L to affect rzip compression level directly as well as backend
compression level and use 9 by default now.
More cleanups of information output.
Use 3 point release numbering in case one minor version has many subversions.
Numerous minor cleanups and tidying.
Updated docs and manpages.
fsync after flushing buffer.
Remove unnecessary memset after anonymous mmap.
Do test malloc before compression backend to see how big a chunk can be passed.
Make stdout write directly to stdout on decompression without the need for temporary files since there is no need to seek backwards.
Make file testing not actually write the file during test.
More tidying up.
Fix the longstanding limit on 32 bits that allowed us to allocate only 2GB of ram by moving the big malloc calls to mmap equivalents which allow us to mmap up to 2^44 bytes of anonymous space.
Use progressively smaller preallocation to try and defragment ram prior to real mmap call to increase success rate of allocating ram when it's a significant proportion of total ram.
Don't fail if preallocation is unsuccessful.
Add more detailed error reporting.
Minor cleanups.
This will increase speed of compression and generate a smaller file, but not be backward compatible.
Tweak the way memory is allocated to optimise chances of success and minimise slowdown for the machine.
fsync to empty dirty data before allocating large ram to increase chance of mem allocation and decrease disk thrash of write vs read.
Add lots more information to verbose mode.
Lots of code tidying and minor tweaks.
Premalloc ram to improve early detection of being unable to allocate that much ram.
Make sure to always make chunk size a multiple of page size for mmap to work.
Begin changes to make variable byte width offsets in rzip chunks.
Decrease header entries to only 2 byte wide as per original rzip.
Random other tidying.