In some cases the return values did not match the documentation,
or the documentation did not document all of the return values.
gzprintf() now consistently returns negative values on error,
which matches the behavior of the stdio fprintf() function.
The previous code slid the window and the hash table and copied
every input byte three times in order to just write the data as
stored blocks with no compression. This commit minimizes sliding
and copying, especially for large input and output buffers.
Level 0 compression is now more than 20 times faster than before
the commit.
Most of the speedup is due to deferring hash table slides until
deflateParams() is called to change the compression level away
from 0. More speedup is due to copying directly from next_in to
next_out when the amounts of available input data and output space
permit it, avoiding the intermediate pending buffer. Additionally,
only the last 32K of the used input data is copied back to the
sliding window when large input buffers are provided.
This alters the specification in zlib.h, so that deflateParams()
will not change any parameters if there is not enough output space
in the event that a block is emitted in order to allow switching
the compression function.
When debugging the Huffman coding would warn about resulting codes
greater than 15 bits in length. This is handled properly, and is
not uncommon. This increases the verbosity of the warning by one,
so that it is not displayed by default.
This speeds up level 0 by about a factor of three, as compared to
the previous byte-at-a-time loop. We can do much better though. A
later commit avoids this copy for level 0 with large buffers,
instead copying directly from the input to the output. This commit
still speeds up storing incompressible data found when compressing
normally.
Compression level 0 requests no compression, using only stored
blocks. When Z_HUFFMAN or Z_RLE was used with level 0 (granted,
an odd choice, but permitted), the resulting blocks were mostly
fixed or dynamic. The reason is that deflate_stored() was not
being called in that case. The compressed data was valid, but it
was not what the application requested. This commit assures that
only stored blocks are emitted for compression level 0, regardless
of the strategy selected.
This updates the OS_CODE determination at compile time to match as
closely as possible the operating system mappings documented in
the PKWare APPNOTE.TXT version 6.3.4, section 4.4.2.2. That byte
in the gzip header is used by nobody for anything, as far as I can
tell. However we might as well try to set it appropriately.
This verifies that the state has been initialized, that it is the
expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.
There is a bug in deflate for windowBits == 8 (256-byte window).
As a result, zlib silently changes a request for 8 to a request
for 9 (512-byte window), and sets the zlib header accordingly so
that the decompressor knows to use a 512-byte window. However if
deflateInit2() is used for raw deflate or gzip streams, then there
is no indication that the request was not honored, and the
application might assume that it can use a 256-byte window when
decompressing. This commit returns an error if the user requests
a 256-byte window when using raw deflate or gzip encoding.
See the comment for more details. This is in response to an issue
raised as a result of a security audit of the zlib code by Trail
of Bits and TrustInSoft, in support of the Mozilla Foundation.
There was a small optimization for PowerPCs to pre-increment a
pointer when accessing a word, instead of post-incrementing. This
required prefacing the loop with a decrement of the pointer,
possibly pointing before the object passed. This is not compliant
with the C standard, for which decrementing a pointer before its
allocated memory is undefined. When tested on a modern PowerPC
with a modern compiler, the optimization no longer has any effect.
Due to all that, and per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this "optimization" was removed, in order to
avoid the possibility of undefined behavior.
inftrees.c was subtracting an offset from a pointer to an array,
in order to provide a pointer that allowed indexing starting at
the offset. This is not compliant with the C standard, for which
the behavior of a pointer decremented before its allocated memory
is undefined. Per the recommendation of a security audit of the
zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this tiny optimization was removed, in order
to avoid the possibility of undefined behavior.
An old inffast.c optimization turns out to not be optimal anymore
with modern compilers, and furthermore was not compliant with the
C standard, for which decrementing a pointer before its allocated
memory is undefined. Per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this "optimization" was removed, in order to
avoid the possibility of undefined behavior.
While woolly mammoths still roamed the Earth and before Atlantis
sunk into the ocean, there were C compilers that could not handle
forward structure references, e.g. "struct name;". zlib dutifully
provided a work-around for such compilers. That work-around is no
longer needed, and, per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, should be removed since what a compiler will
do with this is technically undefined. From the report: "there is
no telling what interactions the bug could have in the future with
link-time optimizations and type-based alias analyses, both
features that are present (but not default) in clang."
The undocumented (except in these commit comments) function
inflateValidate(strm, check) can be called after an inflateInit(),
inflateInit2(), or inflateReset2() with check equal to zero to
turn off the check value (CRC-32 or Adler-32) computation and
comparison. Calling with check not equal to zero turns checking
back on. This should only be called immediately after the init or
reset function. inflateReset() does not change the state, so a
previous inflateValidate() setting will remain in effect.
This also turns off validation of the gzip header CRC when
present.
This should only be used when a zlib or gzip stream has already
been checked, and repeated decompressions of the same stream no
longer need to be validated.
When windowBits is zero, the size of the sliding window comes from
the zlib header. The allowed values of the four-bit field are
0..7, but when windowBits is zero, values greater than 7 are
permitted and acted upon, resulting in large, mostly unused memory
allocations. This fix rejects such invalid zlib headers.
A remarkably creative and diverse set of approaches to letting the
compiler know that opaque was being used when it wasn't is changed
by this commit to the more standard (void)opaque.
To build, simply run configure from the source directory by
specifying its path. That path will be used to find the source
files. The source directory will not be touched. All new and
modified files will be made in the current directory. Discovered
in the process that not all makes understand % or $<, and not all
compilers understand -include or -I-. This required a larger
Makefile.in with explicit dependencies.
This updates the documentation to reflect the behavior of
deflateParams() when it is not able to compress all of the input
data provided so far due to insufficient output space. It also
assures that data provided is compressed before the parameter
changes, even if at the beginning of the stream.