
- Avoid warnings on snprintf() return value - Fix bug in deflate_stored() for zero-length input - Fix bug in gzwrite.c that produced corrupt gzip files - Remove files to be installed before copying them in Makefile.in - Add warnings when compiling with assembler code Changes in 1.2.9 (31 Dec 2016) - Fix contrib/minizip to permit unzipping with desktop API [Zouzou] - Improve contrib/blast to return unused bytes - Assure that gzoffset() is correct when appending - Improve compress() and uncompress() to support large lengths - Fix bug in test/example.c where error code not saved - Remedy Coverity warning [Randers-Pehrson] - Improve speed of gzprintf() in transparent mode - Fix inflateInit2() bug when windowBits is 16 or 32 - Change DEBUG macro to ZLIB_DEBUG - Avoid uninitialized access by gzclose_w() - Allow building zlib outside of the source directory - Fix bug that accepted invalid zlib header when windowBits is zero - Fix gzseek() problem on MinGW due to buggy _lseeki64 there - Loop on write() calls in gzwrite.c in case of non-blocking I/O - Add --warn (-w) option to ./configure for more compiler warnings - Reject a window size of 256 bytes if not using the zlib wrapper - Fix bug when level 0 used with Z_HUFFMAN or Z_RLE - Add --debug (-d) option to ./configure to define ZLIB_DEBUG - Fix bugs in creating a very large gzip header - Add uncompress2() function, which returns the input size used - Assure that deflateParams() will not switch functions mid-block - Dramatically speed up deflation for level 0 (storing) - Add gzfread(), duplicating the interface of fread() - Add gzfwrite(), duplicating the interface of fwrite() - Add deflateGetDictionary() function - Use snprintf() for later versions of Microsoft C - Fix *Init macros to use z_ prefix when requested - Replace as400 with os400 for OS/400 support [Monnerat] - Add crc32_z() and adler32_z() functions with size_t lengths - Update Visual Studio project files [AraHaan] Changes in 1.2.8 (28 Apr 2013) - Update contrib/minizip/iowin32.c for Windows RT [Vollant] - Do not force Z_CONST for C++ - Clean up contrib/vstudio [Roß] - Correct spelling error in zlib.h - Fix mixed line endings in contrib/vstudio Changes in 1.2.7.3 (13 Apr 2013) - Fix version numbers and DLL names in contrib/vstudio/*/zlib.rc Changes in 1.2.7.2 (13 Apr 2013) - Change check for a four-byte type back to hexadecimal - Fix typo in win32/Makefile.msc - Add casts in gzwrite.c for pointer differences Changes in 1.2.7.1 (24 Mar 2013) - Replace use of unsafe string functions with snprintf if available - Avoid including stddef.h on Windows for Z_SOLO compile [Niessink] - Fix gzgetc undefine when Z_PREFIX set [Turk] - Eliminate use of mktemp in Makefile (not always available) - Fix bug in 'F' mode for gzopen() - Add inflateGetDictionary() function - Correct comment in deflate.h - Use _snprintf for snprintf in Microsoft C - On Darwin, only use /usr/bin/libtool if libtool is not Apple - Delete "--version" file if created by "ar --version" [Richard G.] - Fix configure check for veracity of compiler error return codes - Fix CMake compilation of static lib for MSVC2010 x64 - Remove unused variable in infback9.c - Fix argument checks in gzlog_compress() and gzlog_write() - Clean up the usage of z_const and respect const usage within zlib - Clean up examples/gzlog.[ch] comparisons of different types - Avoid shift equal to bits in type (caused endless loop) - Fix uninitialized value bug in gzputc() introduced by const patches - Fix memory allocation error in examples/zran.c [Nor] - Fix bug where gzopen(), gzclose() would write an empty file - Fix bug in gzclose() when gzwrite() runs out of memory - Check for input buffer malloc failure in examples/gzappend.c - Add note to contrib/blast to use binary mode in stdio - Fix comparisons of differently signed integers in contrib/blast - Check for invalid code length codes in contrib/puff - Fix serious but very rare decompression bug in inftrees.c - Update inflateBack() comments, since inflate() can be faster - Use underscored I/O function names for WINAPI_FAMILY - Add _tr_flush_bits to the external symbols prefixed by --zprefix - Add contrib/vstudio/vc10 pre-build step for static only - Quote --version-script argument in CMakeLists.txt - Don't specify --version-script on Apple platforms in CMakeLists.txt - Fix casting error in contrib/testzlib/testzlib.c - Fix types in contrib/minizip to match result of get_crc_table() - Simplify contrib/vstudio/vc10 with 'd' suffix - Add TOP support to win32/Makefile.msc - Suport i686 and amd64 assembler builds in CMakeLists.txt - Fix typos in the use of _LARGEFILE64_SOURCE in zconf.h - Add vc11 and vc12 build files to contrib/vstudio - Add gzvprintf() as an undocumented function in zlib - Fix configure for Sun shell - Remove runtime check in configure for four-byte integer type - Add casts and consts to ease user conversion to C++ - Add man pages for minizip and miniunzip - In Makefile uninstall, don't rm if preceding cd fails - Do not return Z_BUF_ERROR if deflateParam() has nothing to write Changes in 1.2.7 (2 May 2012) - Replace use of memmove() with a simple copy for portability - Test for existence of strerror - Restore gzgetc_ for backward compatibility with 1.2.6 - Fix build with non-GNU make on Solaris - Require gcc 4.0 or later on Mac OS X to use the hidden attribute - Include unistd.h for Watcom C - Use __WATCOMC__ instead of __WATCOM__ - Do not use the visibility attribute if NO_VIZ defined - Improve the detection of no hidden visibility attribute - Avoid using __int64 for gcc or solo compilation - Cast to char * in gzprintf to avoid warnings [Zinser] - Fix make_vms.com for VAX [Zinser] - Don't use library or built-in byte swaps - Simplify test and use of gcc hidden attribute - Fix bug in gzclose_w() when gzwrite() fails to allocate memory - Add "x" (O_EXCL) and "e" (O_CLOEXEC) modes support to gzopen() - Fix bug in test/minigzip.c for configure --solo - Fix contrib/vstudio project link errors [Mohanathas] - Add ability to choose the builder in make_vms.com [Schweda] - Add DESTDIR support to mingw32 win32/Makefile.gcc - Fix comments in win32/Makefile.gcc for proper usage - Allow overriding the default install locations for cmake - Generate and install the pkg-config file with cmake - Build both a static and a shared version of zlib with cmake - Include version symbols for cmake builds - If using cmake with MSVC, add the source directory to the includes - Remove unneeded EXTRA_CFLAGS from win32/Makefile.gcc [Truta] - Move obsolete emx makefile to old [Truta] - Allow the use of -Wundef when compiling or using zlib - Avoid the use of the -u option with mktemp - Improve inflate() documentation on the use of Z_FINISH - Recognize clang as gcc - Add gzopen_w() in Windows for wide character path names - Rename zconf.h in CMakeLists.txt to move it out of the way - Add source directory in CMakeLists.txt for building examples - Look in build directory for zlib.pc in CMakeLists.txt - Remove gzflags from zlibvc.def in vc9 and vc10 - Fix contrib/minizip compilation in the MinGW environment - Update ./configure for Solaris, support --64 [Mooney] - Remove -R. from Solaris shared build (possible security issue) - Avoid race condition for parallel make (-j) running example - Fix type mismatch between get_crc_table() and crc_table - Fix parsing of version with "-" in CMakeLists.txt [Snider, Ziegler] - Fix the path to zlib.map in CMakeLists.txt - Force the native libtool in Mac OS X to avoid GNU libtool [Beebe] - Add instructions to win32/Makefile.gcc for shared install [Torri] Changes in 1.2.6.1 (12 Feb 2012) - Avoid the use of the Objective-C reserved name "id" - Include io.h in gzguts.h for Microsoft compilers - Fix problem with ./configure --prefix and gzgetc macro - Include gz_header definition when compiling zlib solo - Put gzflags() functionality back in zutil.c - Avoid library header include in crc32.c for Z_SOLO - Use name in GCC_CLASSIC as C compiler for coverage testing, if set - Minor cleanup in contrib/minizip/zip.c [Vollant] - Update make_vms.com [Zinser] - Remove unnecessary gzgetc_ function - Use optimized byte swap operations for Microsoft and GNU [Snyder] - Fix minor typo in zlib.h comments [Rzesniowiecki] Changes in 1.2.6 (29 Jan 2012) - Update the Pascal interface in contrib/pascal - Fix function numbers for gzgetc_ in zlibvc.def files - Fix configure.ac for contrib/minizip [Schiffer] - Fix large-entry detection in minizip on 64-bit systems [Schiffer] - Have ./configure use the compiler return code for error indication - Fix CMakeLists.txt for cross compilation [McClure] - Fix contrib/minizip/zip.c for 64-bit architectures [Dalsnes] - Fix compilation of contrib/minizip on FreeBSD [Marquez] - Correct suggested usages in win32/Makefile.msc [Shachar, Horvath] - Include io.h for Turbo C / Borland C on all platforms [Truta] - Make version explicit in contrib/minizip/configure.ac [Bosmans] - Avoid warning for no encryption in contrib/minizip/zip.c [Vollant] - Minor cleanup up contrib/minizip/unzip.c [Vollant] - Fix bug when compiling minizip with C++ [Vollant] - Protect for long name and extra fields in contrib/minizip [Vollant] - Avoid some warnings in contrib/minizip [Vollant] - Add -I../.. -L../.. to CFLAGS for minizip and miniunzip - Add missing libs to minizip linker command - Add support for VPATH builds in contrib/minizip - Add an --enable-demos option to contrib/minizip/configure - Add the generation of configure.log by ./configure - Exit when required parameters not provided to win32/Makefile.gcc - Have gzputc return the character written instead of the argument - Use the -m option on ldconfig for BSD systems [Tobias] - Correct in zlib.map when deflateResetKeep was added Changes in 1.2.5.3 (15 Jan 2012) - Restore gzgetc function for binary compatibility - Do not use _lseeki64 under Borland C++ [Truta] - Update win32/Makefile.msc to build test/*.c [Truta] - Remove old/visualc6 given CMakefile and other alternatives - Update AS400 build files and documentation [Monnerat] - Update win32/Makefile.gcc to build test/*.c [Truta] - Permit stronger flushes after Z_BLOCK flushes - Avoid extraneous empty blocks when doing empty flushes - Permit Z_NULL arguments to deflatePending - Allow deflatePrime() to insert bits in the middle of a stream - Remove second empty static block for Z_PARTIAL_FLUSH - Write out all of the available bits when using Z_BLOCK - Insert the first two strings in the hash table after a flush Changes in 1.2.5.2 (17 Dec 2011) - fix ld error: unable to find version dependency 'ZLIB_1.2.5' - use relative symlinks for shared libs - Avoid searching past window for Z_RLE strategy - Assure that high-water mark initialization is always applied in deflate - Add assertions to fill_window() in deflate.c to match comments - Update python link in README - Correct spelling error in gzread.c - Fix bug in gzgets() for a concatenated empty gzip stream - Correct error in comment for gz_make() - Change gzread() and related to ignore junk after gzip streams - Allow gzread() and related to continue after gzclearerr() - Allow gzrewind() and gzseek() after a premature end-of-file - Simplify gzseek() now that raw after gzip is ignored - Change gzgetc() to a macro for speed (~40% speedup in testing) - Fix gzclose() to return the actual error last encountered - Always add large file support for windows - Include zconf.h for windows large file support - Include zconf.h.cmakein for windows large file support - Update zconf.h.cmakein on make distclean - Merge vestigial vsnprintf determination from zutil.h to gzguts.h - Clarify how gzopen() appends in zlib.h comments - Correct documentation of gzdirect() since junk at end now ignored - Add a transparent write mode to gzopen() when 'T' is in the mode - Update python link in zlib man page - Get inffixed.h and MAKEFIXED result to match - Add a ./config --solo option to make zlib subset with no library use - Add undocumented inflateResetKeep() function for CAB file decoding - Add --cover option to ./configure for gcc coverage testing - Add #define ZLIB_CONST option to use const in the z_stream interface - Add comment to gzdopen() in zlib.h to use dup() when using fileno() - Note behavior of uncompress() to provide as much data as it can - Add files in contrib/minizip to aid in building libminizip - Split off AR options in Makefile.in and configure - Change ON macro to Z_ARG to avoid application conflicts - Facilitate compilation with Borland C++ for pragmas and vsnprintf - Include io.h for Turbo C / Borland C++ - Move example.c and minigzip.c to test/ - Simplify incomplete code table filling in inflate_table() - Remove code from inflate.c and infback.c that is impossible to execute - Test the inflate code with full coverage - Allow deflateSetDictionary, inflateSetDictionary at any time (in raw) - Add deflateResetKeep and fix inflateResetKeep to retain dictionary - Fix gzwrite.c to accommodate reduced memory zlib compilation - Have inflate() with Z_FINISH avoid the allocation of a window - Do not set strm->adler when doing raw inflate - Fix gzeof() to behave just like feof() when read is not past end of file - Fix bug in gzread.c when end-of-file is reached - Avoid use of Z_BUF_ERROR in gz* functions except for premature EOF - Document gzread() capability to read concurrently written files - Remove hard-coding of resource compiler in CMakeLists.txt [Blammo] Changes in 1.2.5.1 (10 Sep 2011) - Update FAQ entry on shared builds (#13) - Avoid symbolic argument to chmod in Makefile.in - Fix bug and add consts in contrib/puff [Oberhumer] - Update contrib/puff/zeros.raw test file to have all block types - Add full coverage test for puff in contrib/puff/Makefile - Fix static-only-build install in Makefile.in - Fix bug in unzGetCurrentFileInfo() in contrib/minizip [Kuno] - Add libz.a dependency to shared in Makefile.in for parallel builds - Spell out "number" (instead of "nb") in zlib.h for total_in, total_out - Replace $(...) with `...` in configure for non-bash sh [Bowler] - Add darwin* to Darwin* and solaris* to SunOS\ 5* in configure [Groffen] - Add solaris* to Linux* in configure to allow gcc use [Groffen] - Add *bsd* to Linux* case in configure [Bar-Lev] - Add inffast.obj to dependencies in win32/Makefile.msc - Correct spelling error in deflate.h [Kohler] - Change libzdll.a again to libz.dll.a (!) in win32/Makefile.gcc - Add test to configure for GNU C looking for gcc in output of $cc -v - Add zlib.pc generation to win32/Makefile.gcc [Weigelt] - Fix bug in zlib.h for _FILE_OFFSET_BITS set and _LARGEFILE64_SOURCE not - Add comment in zlib.h that adler32_combine with len2 < 0 makes no sense - Make NO_DIVIDE option in adler32.c much faster (thanks to John Reiser) - Make stronger test in zconf.h to include unistd.h for LFS - Apply Darwin patches for 64-bit file offsets to contrib/minizip [Slack] - Fix zlib.h LFS support when Z_PREFIX used - Add updated as400 support (removed from old) [Monnerat] - Avoid deflate sensitivity to volatile input data - Avoid division in adler32_combine for NO_DIVIDE - Clarify the use of Z_FINISH with deflateBound() amount of space - Set binary for output file in puff.c - Use u4 type for crc_table to avoid conversion warnings - Apply casts in zlib.h to avoid conversion warnings - Add OF to prototypes for adler32_combine_ and crc32_combine_ [Miller] - Improve inflateSync() documentation to note indeterminancy - Add deflatePending() function to return the amount of pending output - Correct the spelling of "specification" in FAQ [Randers-Pehrson] - Add a check in configure for stdarg.h, use for gzprintf() - Check that pointers fit in ints when gzprint() compiled old style - Add dummy name before $(SHAREDLIBV) in Makefile [Bar-Lev, Bowler] - Delete line in configure that adds -L. libz.a to LDFLAGS [Weigelt] - Add debug records in assmebler code [Londer] - Update RFC references to use http://tools.ietf.org/html/... [Li] - Add --archs option, use of libtool to configure for Mac OS X [Borstel] Changes in 1.2.5 (19 Apr 2010) - Disable visibility attribute in win32/Makefile.gcc [Bar-Lev] - Default to libdir as sharedlibdir in configure [Nieder] - Update copyright dates on modified source files - Update trees.c to be able to generate modified trees.h - Exit configure for MinGW, suggesting win32/Makefile.gcc - Check for NULL path in gz_open [Homurlu] Changes in 1.2.4.5 (18 Apr 2010) - Set sharedlibdir in configure [Torok] - Set LDFLAGS in Makefile.in [Bar-Lev] - Avoid mkdir objs race condition in Makefile.in [Bowler] - Add ZLIB_INTERNAL in front of internal inter-module functions and arrays - Define ZLIB_INTERNAL to hide internal functions and arrays for GNU C - Don't use hidden attribute when it is a warning generator (e.g. Solaris) Changes in 1.2.4.4 (18 Apr 2010) - Fix CROSS_PREFIX executable testing, CHOST extract, mingw* [Torok] - Undefine _LARGEFILE64_SOURCE in zconf.h if it is zero, but not if empty - Try to use bash or ksh regardless of functionality of /bin/sh - Fix configure incompatibility with NetBSD sh - Remove attempt to run under bash or ksh since have better NetBSD fix - Fix win32/Makefile.gcc for MinGW [Bar-Lev] - Add diagnostic messages when using CROSS_PREFIX in configure - Added --sharedlibdir option to configure [Weigelt] - Use hidden visibility attribute when available [Frysinger] Changes in 1.2.4.3 (10 Apr 2010) - Only use CROSS_PREFIX in configure for ar and ranlib if they exist - Use CROSS_PREFIX for nm [Bar-Lev] - Assume _LARGEFILE64_SOURCE defined is equivalent to true - Avoid use of undefined symbols in #if with && and || - Make *64 prototypes in gzguts.h consistent with functions - Add -shared load option for MinGW in configure [Bowler] - Move z_off64_t to public interface, use instead of off64_t - Remove ! from shell test in configure (not portable to Solaris) - Change +0 macro tests to -0 for possibly increased portability Changes in 1.2.4.2 (9 Apr 2010) - Add consistent carriage returns to readme.txt's in masmx86 and masmx64 - Really provide prototypes for *64 functions when building without LFS - Only define unlink() in minigzip.c if unistd.h not included - Update README to point to contrib/vstudio project files - Move projects/vc6 to old/ and remove projects/ - Include stdlib.h in minigzip.c for setmode() definition under WinCE - Clean up assembler builds in win32/Makefile.msc [Rowe] - Include sys/types.h for Microsoft for off_t definition - Fix memory leak on error in gz_open() - Symbolize nm as $NM in configure [Weigelt] - Use TEST_LDSHARED instead of LDSHARED to link test programs [Weigelt] - Add +0 to _FILE_OFFSET_BITS and _LFS64_LARGEFILE in case not defined - Fix bug in gzeof() to take into account unused input data - Avoid initialization of structures with variables in puff.c - Updated win32/README-WIN32.txt [Rowe] Changes in 1.2.4.1 (28 Mar 2010) - Remove the use of [a-z] constructs for sed in configure [gentoo 310225] - Remove $(SHAREDLIB) from LIBS in Makefile.in [Creech] - Restore "for debugging" comment on sprintf() in gzlib.c - Remove fdopen for MVS from gzguts.h - Put new README-WIN32.txt in win32 [Rowe] - Add check for shell to configure and invoke another shell if needed - Fix big fat stinking bug in gzseek() on uncompressed files - Remove vestigial F_OPEN64 define in zutil.h - Set and check the value of _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE - Avoid errors on non-LFS systems when applications define LFS macros - Set EXE to ".exe" in configure for MINGW [Kahle] - Match crc32() in crc32.c exactly to the prototype in zlib.h [Sherrill] - Add prefix for cross-compilation in win32/makefile.gcc [Bar-Lev] - Add DLL install in win32/makefile.gcc [Bar-Lev] - Allow Linux* or linux* from uname in configure [Bar-Lev] - Allow ldconfig to be redefined in configure and Makefile.in [Bar-Lev] - Add cross-compilation prefixes to configure [Bar-Lev] - Match type exactly in gz_load() invocation in gzread.c - Match type exactly of zcalloc() in zutil.c to zlib.h alloc_func - Provide prototypes for *64 functions when building zlib without LFS - Don't use -lc when linking shared library on MinGW - Remove errno.h check in configure and vestigial errno code in zutil.h Changes in 1.2.4 (14 Mar 2010) - Fix VER3 extraction in configure for no fourth subversion - Update zlib.3, add docs to Makefile.in to make .pdf out of it - Add zlib.3.pdf to distribution - Don't set error code in gzerror() if passed pointer is NULL - Apply destination directory fixes to CMakeLists.txt [Lowman] - Move #cmakedefine's to a new zconf.in.cmakein - Restore zconf.h for builds that don't use configure or cmake - Add distclean to dummy Makefile for convenience - Update and improve INDEX, README, and FAQ - Update CMakeLists.txt for the return of zconf.h [Lowman] - Update contrib/vstudio/vc9 and vc10 [Vollant] - Change libz.dll.a back to libzdll.a in win32/Makefile.gcc - Apply license and readme changes to contrib/asm686 [Raiter] - Check file name lengths and add -c option in minigzip.c [Li] - Update contrib/amd64 and contrib/masmx86/ [Vollant] - Avoid use of "eof" parameter in trees.c to not shadow library variable - Update make_vms.com for removal of zlibdefs.h [Zinser] - Update assembler code and vstudio projects in contrib [Vollant] - Remove outdated assembler code contrib/masm686 and contrib/asm586 - Remove old vc7 and vc8 from contrib/vstudio - Update win32/Makefile.msc, add ZLIB_VER_SUBREVISION [Rowe] - Fix memory leaks in gzclose_r() and gzclose_w(), file leak in gz_open() - Add contrib/gcc_gvmat64 for longest_match and inflate_fast [Vollant] - Remove *64 functions from win32/zlib.def (they're not 64-bit yet) - Fix bug in void-returning vsprintf() case in gzwrite.c - Fix name change from inflate.h in contrib/inflate86/inffas86.c - Check if temporary file exists before removing in make_vms.com [Zinser] - Fix make install and uninstall for --static option - Fix usage of _MSC_VER in gzguts.h and zutil.h [Truta] - Update readme.txt in contrib/masmx64 and masmx86 to assemble Changes in 1.2.3.9 (21 Feb 2010) - Expunge gzio.c - Move as400 build information to old - Fix updates in contrib/minizip and contrib/vstudio - Add const to vsnprintf test in configure to avoid warnings [Weigelt] - Delete zconf.h (made by configure) [Weigelt] - Change zconf.in.h to zconf.h.in per convention [Weigelt] - Check for NULL buf in gzgets() - Return empty string for gzgets() with len == 1 (like fgets()) - Fix description of gzgets() in zlib.h for end-of-file, NULL return - Update minizip to 1.1 [Vollant] - Avoid MSVC loss of data warnings in gzread.c, gzwrite.c - Note in zlib.h that gzerror() should be used to distinguish from EOF - Remove use of snprintf() from gzlib.c - Fix bug in gzseek() - Update contrib/vstudio, adding vc9 and vc10 [Kuno, Vollant] - Fix zconf.h generation in CMakeLists.txt [Lowman] - Improve comments in zconf.h where modified by configure Changes in 1.2.3.8 (13 Feb 2010) - Clean up text files (tabs, trailing whitespace, etc.) [Oberhumer] - Use z_off64_t in gz_zero() and gz_skip() to match state->skip - Avoid comparison problem when sizeof(int) == sizeof(z_off64_t) - Revert to Makefile.in from 1.2.3.6 (live with the clutter) - Fix missing error return in gzflush(), add zlib.h note - Add *64 functions to zlib.map [Levin] - Fix signed/unsigned comparison in gz_comp() - Use SFLAGS when testing shared linking in configure - Add --64 option to ./configure to use -m64 with gcc - Fix ./configure --help to correctly name options - Have make fail if a test fails [Levin] - Avoid buffer overrun in contrib/masmx64/gvmat64.asm [Simpson] - Remove assembler object files from contrib Changes in 1.2.3.7 (24 Jan 2010) - Always gzopen() with O_LARGEFILE if available - Fix gzdirect() to work immediately after gzopen() or gzdopen() - Make gzdirect() more precise when the state changes while reading - Improve zlib.h documentation in many places - Catch memory allocation failure in gz_open() - Complete close operation if seek forward in gzclose_w() fails - Return Z_ERRNO from gzclose_r() if close() fails - Return Z_STREAM_ERROR instead of EOF for gzclose() being passed NULL - Return zero for gzwrite() errors to match zlib.h description - Return -1 on gzputs() error to match zlib.h description - Add zconf.in.h to allow recovery from configure modification [Weigelt] - Fix static library permissions in Makefile.in [Weigelt] - Avoid warnings in configure tests that hide functionality [Weigelt] - Add *BSD and DragonFly to Linux case in configure [gentoo 123571] - Change libzdll.a to libz.dll.a in win32/Makefile.gcc [gentoo 288212] - Avoid access of uninitialized data for first inflateReset2 call [Gomes] - Keep object files in subdirectories to reduce the clutter somewhat - Remove default Makefile and zlibdefs.h, add dummy Makefile - Add new external functions to Z_PREFIX, remove duplicates, z_z_ -> z_ - Remove zlibdefs.h completely -- modify zconf.h instead Changes in 1.2.3.6 (17 Jan 2010) - Avoid void * arithmetic in gzread.c and gzwrite.c - Make compilers happier with const char * for gz_error message - Avoid unused parameter warning in inflate.c - Avoid signed-unsigned comparison warning in inflate.c - Indent #pragma's for traditional C - Fix usage of strwinerror() in glib.c, change to gz_strwinerror() - Correct email address in configure for system options - Update make_vms.com and add make_vms.com to contrib/minizip [Zinser] - Update zlib.map [Brown] - Fix Makefile.in for Solaris 10 make of example64 and minizip64 [Torok] - Apply various fixes to CMakeLists.txt [Lowman] - Add checks on len in gzread() and gzwrite() - Add error message for no more room for gzungetc() - Remove zlib version check in gzwrite() - Defer compression of gzprintf() result until need to - Use snprintf() in gzdopen() if available - Remove USE_MMAP configuration determination (only used by minigzip) - Remove examples/pigz.c (available separately) - Update examples/gun.c to 1.6 Changes in 1.2.3.5 (8 Jan 2010) - Add space after #if in zutil.h for some compilers - Fix relatively harmless bug in deflate_fast() [Exarevsky] - Fix same problem in deflate_slow() - Add $(SHAREDLIBV) to LIBS in Makefile.in [Brown] - Add deflate_rle() for faster Z_RLE strategy run-length encoding - Add deflate_huff() for faster Z_HUFFMAN_ONLY encoding - Change name of "write" variable in inffast.c to avoid library collisions - Fix premature EOF from gzread() in gzio.c [Brown] - Use zlib header window size if windowBits is 0 in inflateInit2() - Remove compressBound() call in deflate.c to avoid linking compress.o - Replace use of errno in gz* with functions, support WinCE [Alves] - Provide alternative to perror() in minigzip.c for WinCE [Alves] - Don't use _vsnprintf on later versions of MSVC [Lowman] - Add CMake build script and input file [Lowman] - Update contrib/minizip to 1.1 [Svensson, Vollant] - Moved nintendods directory from contrib to . - Replace gzio.c with a new set of routines with the same functionality - Add gzbuffer(), gzoffset(), gzclose_r(), gzclose_w() as part of above - Update contrib/minizip to 1.1b - Change gzeof() to return 0 on error instead of -1 to agree with zlib.h Changes in 1.2.3.4 (21 Dec 2009) - Use old school .SUFFIXES in Makefile.in for FreeBSD compatibility - Update comments in configure and Makefile.in for default --shared - Fix test -z's in configure [Marquess] - Build examplesh and minigzipsh when not testing - Change NULL's to Z_NULL's in deflate.c and in comments in zlib.h - Import LDFLAGS from the environment in configure - Fix configure to populate SFLAGS with discovered CFLAGS options - Adapt make_vms.com to the new Makefile.in [Zinser] - Add zlib2ansi script for C++ compilation [Marquess] - Add _FILE_OFFSET_BITS=64 test to make test (when applicable) - Add AMD64 assembler code for longest match to contrib [Teterin] - Include options from $SFLAGS when doing $LDSHARED - Simplify 64-bit file support by introducing z_off64_t type - Make shared object files in objs directory to work around old Sun cc - Use only three-part version number for Darwin shared compiles - Add rc option to ar in Makefile.in for when ./configure not run - Add -WI,-rpath,. to LDFLAGS for OSF 1 V4* - Set LD_LIBRARYN32_PATH for SGI IRIX shared compile - Protect against _FILE_OFFSET_BITS being defined when compiling zlib - Rename Makefile.in targets allstatic to static and allshared to shared - Fix static and shared Makefile.in targets to be independent - Correct error return bug in gz_open() by setting state [Brown] - Put spaces before ;;'s in configure for better sh compatibility - Add pigz.c (parallel implementation of gzip) to examples/ - Correct constant in crc32.c to UL [Leventhal] - Reject negative lengths in crc32_combine() - Add inflateReset2() function to work like inflateEnd()/inflateInit2() - Include sys/types.h for _LARGEFILE64_SOURCE [Brown] - Correct typo in doc/algorithm.txt [Janik] - Fix bug in adler32_combine() [Zhu] - Catch missing-end-of-block-code error in all inflates and in puff Assures that random input to inflate eventually results in an error - Added enough.c (calculation of ENOUGH for inftrees.h) to examples/ - Update ENOUGH and its usage to reflect discovered bounds - Fix gzerror() error report on empty input file [Brown] - Add ush casts in trees.c to avoid pedantic runtime errors - Fix typo in zlib.h uncompress() description [Reiss] - Correct inflate() comments with regard to automatic header detection - Remove deprecation comment on Z_PARTIAL_FLUSH (it stays) - Put new version of gzlog (2.0) in examples with interruption recovery - Add puff compile option to permit invalid distance-too-far streams - Add puff TEST command options, ability to read piped input - Prototype the *64 functions in zlib.h when _FILE_OFFSET_BITS == 64, but _LARGEFILE64_SOURCE not defined - Fix Z_FULL_FLUSH to truly erase the past by resetting s->strstart - Fix deflateSetDictionary() to use all 32K for output consistency - Remove extraneous #define MIN_LOOKAHEAD in deflate.c (in deflate.h) - Clear bytes after deflate lookahead to avoid use of uninitialized data - Change a limit in inftrees.c to be more transparent to Coverity Prevent - Update win32/zlib.def with exported symbols from zlib.h - Correct spelling errors in zlib.h [Willem, Sobrado] - Allow Z_BLOCK for deflate() to force a new block - Allow negative bits in inflatePrime() to delete existing bit buffer - Add Z_TREES flush option to inflate() to return at end of trees - Add inflateMark() to return current state information for random access - Add Makefile for NintendoDS to contrib [Costa] - Add -w in configure compile tests to avoid spurious warnings [Beucler] - Fix typos in zlib.h comments for deflateSetDictionary() - Fix EOF detection in transparent gzread() [Maier] Changes in 1.2.3.3 (2 October 2006) - Make --shared the default for configure, add a --static option - Add compile option to permit invalid distance-too-far streams - Add inflateUndermine() function which is required to enable above - Remove use of "this" variable name for C++ compatibility [Marquess] - Add testing of shared library in make test, if shared library built - Use ftello() and fseeko() if available instead of ftell() and fseek() - Provide two versions of all functions that use the z_off_t type for binary compatibility -- a normal version and a 64-bit offset version, per the Large File Support Extension when _LARGEFILE64_SOURCE is defined; use the 64-bit versions by default when _FILE_OFFSET_BITS is defined to be 64 - Add a --uname= option to configure to perhaps help with cross-compiling Changes in 1.2.3.2 (3 September 2006) - Turn off silly Borland warnings [Hay] - Use off64_t and define _LARGEFILE64_SOURCE when present - Fix missing dependency on inffixed.h in Makefile.in - Rig configure --shared to build both shared and static [Teredesai, Truta] - Remove zconf.in.h and instead create a new zlibdefs.h file - Fix contrib/minizip/unzip.c non-encrypted after encrypted [Vollant] - Add treebuild.xml (see http://treebuild.metux.de/) [Weigelt] Changes in 1.2.3.1 (16 August 2006) - Add watcom directory with OpenWatcom make files [Daniel] - Remove #undef of FAR in zconf.in.h for MVS [Fedtke] - Update make_vms.com [Zinser] - Use -fPIC for shared build in configure [Teredesai, Nicholson] - Use only major version number for libz.so on IRIX and OSF1 [Reinholdtsen] - Use fdopen() (not _fdopen()) for Interix in zutil.h [Bäck] - Add some FAQ entries about the contrib directory - Update the MVS question in the FAQ - Avoid extraneous reads after EOF in gzio.c [Brown] - Correct spelling of "successfully" in gzio.c [Randers-Pehrson] - Add comments to zlib.h about gzerror() usage [Brown] - Set extra flags in gzip header in gzopen() like deflate() does - Make configure options more compatible with double-dash conventions [Weigelt] - Clean up compilation under Solaris SunStudio cc [Rowe, Reinholdtsen] - Fix uninstall target in Makefile.in [Truta] - Add pkgconfig support [Weigelt] - Use $(DESTDIR) macro in Makefile.in [Reinholdtsen, Weigelt] - Replace set_data_type() with a more accurate detect_data_type() in trees.c, according to the txtvsbin.txt document [Truta] - Swap the order of #include <stdio.h> and #include "zlib.h" in gzio.c, example.c and minigzip.c [Truta] - Shut up annoying VS2005 warnings about standard C deprecation [Rowe, Truta] (where?) - Fix target "clean" from win32/Makefile.bor [Truta] - Create .pdb and .manifest files in win32/makefile.msc [Ziegler, Rowe] - Update zlib www home address in win32/DLL_FAQ.txt [Truta] - Update contrib/masmx86/inffas32.asm for VS2005 [Vollant, Van Wassenhove] - Enable browse info in the "Debug" and "ASM Debug" configurations in the Visual C++ 6 project, and set (non-ASM) "Debug" as default [Truta] - Add pkgconfig support [Weigelt] - Add ZLIB_VER_MAJOR, ZLIB_VER_MINOR and ZLIB_VER_REVISION in zlib.h, for use in win32/zlib1.rc [Polushin, Rowe, Truta] - Add a document that explains the new text detection scheme to doc/txtvsbin.txt [Truta] - Add rfc1950.txt, rfc1951.txt and rfc1952.txt to doc/ [Truta] - Move algorithm.txt into doc/ [Truta] - Synchronize FAQ with website - Fix compressBound(), was low for some pathological cases [Fearnley] - Take into account wrapper variations in deflateBound() - Set examples/zpipe.c input and output to binary mode for Windows - Update examples/zlib_how.html with new zpipe.c (also web site) - Fix some warnings in examples/gzlog.c and examples/zran.c (it seems that gcc became pickier in 4.0) - Add zlib.map for Linux: "All symbols from zlib-1.1.4 remain un-versioned, the patch adds versioning only for symbols introduced in zlib-1.2.0 or later. It also declares as local those symbols which are not designed to be exported." [Levin] - Update Z_PREFIX list in zconf.in.h, add --zprefix option to configure - Do not initialize global static by default in trees.c, add a response NO_INIT_GLOBAL_POINTERS to initialize them if needed [Marquess] - Don't use strerror() in gzio.c under WinCE [Yakimov] - Don't use errno.h in zutil.h under WinCE [Yakimov] - Move arguments for AR to its usage to allow replacing ar [Marot] - Add HAVE_VISIBILITY_PRAGMA in zconf.in.h for Mozilla [Randers-Pehrson] - Improve inflateInit() and inflateInit2() documentation - Fix structure size comment in inflate.h - Change configure help option from --h* to --help [Santos]
956 lines
36 KiB
Plaintext
956 lines
36 KiB
Plaintext
|
||
|
||
|
||
|
||
|
||
|
||
Network Working Group P. Deutsch
|
||
Request for Comments: 1951 Aladdin Enterprises
|
||
Category: Informational May 1996
|
||
|
||
|
||
DEFLATE Compressed Data Format Specification version 1.3
|
||
|
||
Status of This Memo
|
||
|
||
This memo provides information for the Internet community. This memo
|
||
does not specify an Internet standard of any kind. Distribution of
|
||
this memo is unlimited.
|
||
|
||
IESG Note:
|
||
|
||
The IESG takes no position on the validity of any Intellectual
|
||
Property Rights statements contained in this document.
|
||
|
||
Notices
|
||
|
||
Copyright (c) 1996 L. Peter Deutsch
|
||
|
||
Permission is granted to copy and distribute this document for any
|
||
purpose and without charge, including translations into other
|
||
languages and incorporation into compilations, provided that the
|
||
copyright notice and this notice are preserved, and that any
|
||
substantive changes or deletions from the original are clearly
|
||
marked.
|
||
|
||
A pointer to the latest version of this and related documentation in
|
||
HTML format can be found at the URL
|
||
<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
|
||
|
||
Abstract
|
||
|
||
This specification defines a lossless compressed data format that
|
||
compresses data using a combination of the LZ77 algorithm and Huffman
|
||
coding, with efficiency comparable to the best currently available
|
||
general-purpose compression methods. The data can be produced or
|
||
consumed, even for an arbitrarily long sequentially presented input
|
||
data stream, using only an a priori bounded amount of intermediate
|
||
storage. The format can be implemented readily in a manner not
|
||
covered by patents.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 1]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
Table of Contents
|
||
|
||
1. Introduction ................................................... 2
|
||
1.1. Purpose ................................................... 2
|
||
1.2. Intended audience ......................................... 3
|
||
1.3. Scope ..................................................... 3
|
||
1.4. Compliance ................................................ 3
|
||
1.5. Definitions of terms and conventions used ................ 3
|
||
1.6. Changes from previous versions ............................ 4
|
||
2. Compressed representation overview ............................. 4
|
||
3. Detailed specification ......................................... 5
|
||
3.1. Overall conventions ....................................... 5
|
||
3.1.1. Packing into bytes .................................. 5
|
||
3.2. Compressed block format ................................... 6
|
||
3.2.1. Synopsis of prefix and Huffman coding ............... 6
|
||
3.2.2. Use of Huffman coding in the "deflate" format ....... 7
|
||
3.2.3. Details of block format ............................. 9
|
||
3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
|
||
3.2.5. Compressed blocks (length and distance codes) ...... 11
|
||
3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
|
||
3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
|
||
3.3. Compliance ............................................... 14
|
||
4. Compression algorithm details ................................. 14
|
||
5. References .................................................... 16
|
||
6. Security Considerations ....................................... 16
|
||
7. Source code ................................................... 16
|
||
8. Acknowledgements .............................................. 16
|
||
9. Author's Address .............................................. 17
|
||
|
||
1. Introduction
|
||
|
||
1.1. Purpose
|
||
|
||
The purpose of this specification is to define a lossless
|
||
compressed data format that:
|
||
* Is independent of CPU type, operating system, file system,
|
||
and character set, and hence can be used for interchange;
|
||
* Can be produced or consumed, even for an arbitrarily long
|
||
sequentially presented input data stream, using only an a
|
||
priori bounded amount of intermediate storage, and hence
|
||
can be used in data communications or similar structures
|
||
such as Unix filters;
|
||
* Compresses data with efficiency comparable to the best
|
||
currently available general-purpose compression methods,
|
||
and in particular considerably better than the "compress"
|
||
program;
|
||
* Can be implemented readily in a manner not covered by
|
||
patents, and hence can be practiced freely;
|
||
|
||
|
||
|
||
Deutsch Informational [Page 2]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
* Is compatible with the file format produced by the current
|
||
widely used gzip utility, in that conforming decompressors
|
||
will be able to read data produced by the existing gzip
|
||
compressor.
|
||
|
||
The data format defined by this specification does not attempt to:
|
||
|
||
* Allow random access to compressed data;
|
||
* Compress specialized data (e.g., raster graphics) as well
|
||
as the best currently available specialized algorithms.
|
||
|
||
A simple counting argument shows that no lossless compression
|
||
algorithm can compress every possible input data set. For the
|
||
format defined here, the worst case expansion is 5 bytes per 32K-
|
||
byte block, i.e., a size increase of 0.015% for large data sets.
|
||
English text usually compresses by a factor of 2.5 to 3;
|
||
executable files usually compress somewhat less; graphical data
|
||
such as raster images may compress much more.
|
||
|
||
1.2. Intended audience
|
||
|
||
This specification is intended for use by implementors of software
|
||
to compress data into "deflate" format and/or decompress data from
|
||
"deflate" format.
|
||
|
||
The text of the specification assumes a basic background in
|
||
programming at the level of bits and other primitive data
|
||
representations. Familiarity with the technique of Huffman coding
|
||
is helpful but not required.
|
||
|
||
1.3. Scope
|
||
|
||
The specification specifies a method for representing a sequence
|
||
of bytes as a (usually shorter) sequence of bits, and a method for
|
||
packing the latter bit sequence into bytes.
|
||
|
||
1.4. Compliance
|
||
|
||
Unless otherwise indicated below, a compliant decompressor must be
|
||
able to accept and decompress any data set that conforms to all
|
||
the specifications presented here; a compliant compressor must
|
||
produce data sets that conform to all the specifications presented
|
||
here.
|
||
|
||
1.5. Definitions of terms and conventions used
|
||
|
||
Byte: 8 bits stored or transmitted as a unit (same as an octet).
|
||
For this specification, a byte is exactly 8 bits, even on machines
|
||
|
||
|
||
|
||
Deutsch Informational [Page 3]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
which store a character on a number of bits different from eight.
|
||
See below, for the numbering of bits within a byte.
|
||
|
||
String: a sequence of arbitrary bytes.
|
||
|
||
1.6. Changes from previous versions
|
||
|
||
There have been no technical changes to the deflate format since
|
||
version 1.1 of this specification. In version 1.2, some
|
||
terminology was changed. Version 1.3 is a conversion of the
|
||
specification to RFC style.
|
||
|
||
2. Compressed representation overview
|
||
|
||
A compressed data set consists of a series of blocks, corresponding
|
||
to successive blocks of input data. The block sizes are arbitrary,
|
||
except that non-compressible blocks are limited to 65,535 bytes.
|
||
|
||
Each block is compressed using a combination of the LZ77 algorithm
|
||
and Huffman coding. The Huffman trees for each block are independent
|
||
of those for previous or subsequent blocks; the LZ77 algorithm may
|
||
use a reference to a duplicated string occurring in a previous block,
|
||
up to 32K input bytes before.
|
||
|
||
Each block consists of two parts: a pair of Huffman code trees that
|
||
describe the representation of the compressed data part, and a
|
||
compressed data part. (The Huffman trees themselves are compressed
|
||
using Huffman encoding.) The compressed data consists of a series of
|
||
elements of two types: literal bytes (of strings that have not been
|
||
detected as duplicated within the previous 32K input bytes), and
|
||
pointers to duplicated strings, where a pointer is represented as a
|
||
pair <length, backward distance>. The representation used in the
|
||
"deflate" format limits distances to 32K bytes and lengths to 258
|
||
bytes, but does not limit the size of a block, except for
|
||
uncompressible blocks, which are limited as noted above.
|
||
|
||
Each type of value (literals, distances, and lengths) in the
|
||
compressed data is represented using a Huffman code, using one code
|
||
tree for literals and lengths and a separate code tree for distances.
|
||
The code trees for each block appear in a compact form just before
|
||
the compressed data for that block.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 4]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
3. Detailed specification
|
||
|
||
3.1. Overall conventions In the diagrams below, a box like this:
|
||
|
||
+---+
|
||
| | <-- the vertical bars might be missing
|
||
+---+
|
||
|
||
represents one byte; a box like this:
|
||
|
||
+==============+
|
||
| |
|
||
+==============+
|
||
|
||
represents a variable number of bytes.
|
||
|
||
Bytes stored within a computer do not have a "bit order", since
|
||
they are always treated as a unit. However, a byte considered as
|
||
an integer between 0 and 255 does have a most- and least-
|
||
significant bit, and since we write numbers with the most-
|
||
significant digit on the left, we also write bytes with the most-
|
||
significant bit on the left. In the diagrams below, we number the
|
||
bits of a byte so that bit 0 is the least-significant bit, i.e.,
|
||
the bits are numbered:
|
||
|
||
+--------+
|
||
|76543210|
|
||
+--------+
|
||
|
||
Within a computer, a number may occupy multiple bytes. All
|
||
multi-byte numbers in the format described here are stored with
|
||
the least-significant byte first (at the lower memory address).
|
||
For example, the decimal number 520 is stored as:
|
||
|
||
0 1
|
||
+--------+--------+
|
||
|00001000|00000010|
|
||
+--------+--------+
|
||
^ ^
|
||
| |
|
||
| + more significant byte = 2 x 256
|
||
+ less significant byte = 8
|
||
|
||
3.1.1. Packing into bytes
|
||
|
||
This document does not address the issue of the order in which
|
||
bits of a byte are transmitted on a bit-sequential medium,
|
||
since the final data format described here is byte- rather than
|
||
|
||
|
||
|
||
Deutsch Informational [Page 5]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
bit-oriented. However, we describe the compressed block format
|
||
in below, as a sequence of data elements of various bit
|
||
lengths, not a sequence of bytes. We must therefore specify
|
||
how to pack these data elements into bytes to form the final
|
||
compressed byte sequence:
|
||
|
||
* Data elements are packed into bytes in order of
|
||
increasing bit number within the byte, i.e., starting
|
||
with the least-significant bit of the byte.
|
||
* Data elements other than Huffman codes are packed
|
||
starting with the least-significant bit of the data
|
||
element.
|
||
* Huffman codes are packed starting with the most-
|
||
significant bit of the code.
|
||
|
||
In other words, if one were to print out the compressed data as
|
||
a sequence of bytes, starting with the first byte at the
|
||
*right* margin and proceeding to the *left*, with the most-
|
||
significant bit of each byte on the left as usual, one would be
|
||
able to parse the result from right to left, with fixed-width
|
||
elements in the correct MSB-to-LSB order and Huffman codes in
|
||
bit-reversed order (i.e., with the first bit of the code in the
|
||
relative LSB position).
|
||
|
||
3.2. Compressed block format
|
||
|
||
3.2.1. Synopsis of prefix and Huffman coding
|
||
|
||
Prefix coding represents symbols from an a priori known
|
||
alphabet by bit sequences (codes), one code for each symbol, in
|
||
a manner such that different symbols may be represented by bit
|
||
sequences of different lengths, but a parser can always parse
|
||
an encoded string unambiguously symbol-by-symbol.
|
||
|
||
We define a prefix code in terms of a binary tree in which the
|
||
two edges descending from each non-leaf node are labeled 0 and
|
||
1 and in which the leaf nodes correspond one-for-one with (are
|
||
labeled with) the symbols of the alphabet; then the code for a
|
||
symbol is the sequence of 0's and 1's on the edges leading from
|
||
the root to the leaf labeled with that symbol. For example:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 6]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
/\ Symbol Code
|
||
0 1 ------ ----
|
||
/ \ A 00
|
||
/\ B B 1
|
||
0 1 C 011
|
||
/ \ D 010
|
||
A /\
|
||
0 1
|
||
/ \
|
||
D C
|
||
|
||
A parser can decode the next symbol from an encoded input
|
||
stream by walking down the tree from the root, at each step
|
||
choosing the edge corresponding to the next input bit.
|
||
|
||
Given an alphabet with known symbol frequencies, the Huffman
|
||
algorithm allows the construction of an optimal prefix code
|
||
(one which represents strings with those symbol frequencies
|
||
using the fewest bits of any possible prefix codes for that
|
||
alphabet). Such a code is called a Huffman code. (See
|
||
reference [1] in Chapter 5, references for additional
|
||
information on Huffman codes.)
|
||
|
||
Note that in the "deflate" format, the Huffman codes for the
|
||
various alphabets must not exceed certain maximum code lengths.
|
||
This constraint complicates the algorithm for computing code
|
||
lengths from symbol frequencies. Again, see Chapter 5,
|
||
references for details.
|
||
|
||
3.2.2. Use of Huffman coding in the "deflate" format
|
||
|
||
The Huffman codes used for each alphabet in the "deflate"
|
||
format have two additional rules:
|
||
|
||
* All codes of a given bit length have lexicographically
|
||
consecutive values, in the same order as the symbols
|
||
they represent;
|
||
|
||
* Shorter codes lexicographically precede longer codes.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 7]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
We could recode the example above to follow this rule as
|
||
follows, assuming that the order of the alphabet is ABCD:
|
||
|
||
Symbol Code
|
||
------ ----
|
||
A 10
|
||
B 0
|
||
C 110
|
||
D 111
|
||
|
||
I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
|
||
lexicographically consecutive.
|
||
|
||
Given this rule, we can define the Huffman code for an alphabet
|
||
just by giving the bit lengths of the codes for each symbol of
|
||
the alphabet in order; this is sufficient to determine the
|
||
actual codes. In our example, the code is completely defined
|
||
by the sequence of bit lengths (2, 1, 3, 3). The following
|
||
algorithm generates the codes as integers, intended to be read
|
||
from most- to least-significant bit. The code lengths are
|
||
initially in tree[I].Len; the codes are produced in
|
||
tree[I].Code.
|
||
|
||
1) Count the number of codes for each code length. Let
|
||
bl_count[N] be the number of codes of length N, N >= 1.
|
||
|
||
2) Find the numerical value of the smallest code for each
|
||
code length:
|
||
|
||
code = 0;
|
||
bl_count[0] = 0;
|
||
for (bits = 1; bits <= MAX_BITS; bits++) {
|
||
code = (code + bl_count[bits-1]) << 1;
|
||
next_code[bits] = code;
|
||
}
|
||
|
||
3) Assign numerical values to all codes, using consecutive
|
||
values for all codes of the same length with the base
|
||
values determined at step 2. Codes that are never used
|
||
(which have a bit length of zero) must not be assigned a
|
||
value.
|
||
|
||
for (n = 0; n <= max_code; n++) {
|
||
len = tree[n].Len;
|
||
if (len != 0) {
|
||
tree[n].Code = next_code[len];
|
||
next_code[len]++;
|
||
}
|
||
|
||
|
||
|
||
Deutsch Informational [Page 8]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
}
|
||
|
||
Example:
|
||
|
||
Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
|
||
3, 2, 4, 4). After step 1, we have:
|
||
|
||
N bl_count[N]
|
||
- -----------
|
||
2 1
|
||
3 5
|
||
4 2
|
||
|
||
Step 2 computes the following next_code values:
|
||
|
||
N next_code[N]
|
||
- ------------
|
||
1 0
|
||
2 0
|
||
3 2
|
||
4 14
|
||
|
||
Step 3 produces the following code values:
|
||
|
||
Symbol Length Code
|
||
------ ------ ----
|
||
A 3 010
|
||
B 3 011
|
||
C 3 100
|
||
D 3 101
|
||
E 3 110
|
||
F 2 00
|
||
G 4 1110
|
||
H 4 1111
|
||
|
||
3.2.3. Details of block format
|
||
|
||
Each block of compressed data begins with 3 header bits
|
||
containing the following data:
|
||
|
||
first bit BFINAL
|
||
next 2 bits BTYPE
|
||
|
||
Note that the header bits do not necessarily begin on a byte
|
||
boundary, since a block does not necessarily occupy an integral
|
||
number of bytes.
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 9]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
BFINAL is set if and only if this is the last block of the data
|
||
set.
|
||
|
||
BTYPE specifies how the data are compressed, as follows:
|
||
|
||
00 - no compression
|
||
01 - compressed with fixed Huffman codes
|
||
10 - compressed with dynamic Huffman codes
|
||
11 - reserved (error)
|
||
|
||
The only difference between the two compressed cases is how the
|
||
Huffman codes for the literal/length and distance alphabets are
|
||
defined.
|
||
|
||
In all cases, the decoding algorithm for the actual data is as
|
||
follows:
|
||
|
||
do
|
||
read block header from input stream.
|
||
if stored with no compression
|
||
skip any remaining bits in current partially
|
||
processed byte
|
||
read LEN and NLEN (see next section)
|
||
copy LEN bytes of data to output
|
||
otherwise
|
||
if compressed with dynamic Huffman codes
|
||
read representation of code trees (see
|
||
subsection below)
|
||
loop (until end of block code recognized)
|
||
decode literal/length value from input stream
|
||
if value < 256
|
||
copy value (literal byte) to output stream
|
||
otherwise
|
||
if value = end of block (256)
|
||
break from loop
|
||
otherwise (value = 257..285)
|
||
decode distance from input stream
|
||
|
||
move backwards distance bytes in the output
|
||
stream, and copy length bytes from this
|
||
position to the output stream.
|
||
end loop
|
||
while not last block
|
||
|
||
Note that a duplicated string reference may refer to a string
|
||
in a previous block; i.e., the backward distance may cross one
|
||
or more block boundaries. However a distance cannot refer past
|
||
the beginning of the output stream. (An application using a
|
||
|
||
|
||
|
||
Deutsch Informational [Page 10]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
preset dictionary might discard part of the output stream; a
|
||
distance can refer to that part of the output stream anyway)
|
||
Note also that the referenced string may overlap the current
|
||
position; for example, if the last 2 bytes decoded have values
|
||
X and Y, a string reference with <length = 5, distance = 2>
|
||
adds X,Y,X,Y,X to the output stream.
|
||
|
||
We now specify each compression method in turn.
|
||
|
||
3.2.4. Non-compressed blocks (BTYPE=00)
|
||
|
||
Any bits of input up to the next byte boundary are ignored.
|
||
The rest of the block consists of the following information:
|
||
|
||
0 1 2 3 4...
|
||
+---+---+---+---+================================+
|
||
| LEN | NLEN |... LEN bytes of literal data...|
|
||
+---+---+---+---+================================+
|
||
|
||
LEN is the number of data bytes in the block. NLEN is the
|
||
one's complement of LEN.
|
||
|
||
3.2.5. Compressed blocks (length and distance codes)
|
||
|
||
As noted above, encoded data blocks in the "deflate" format
|
||
consist of sequences of symbols drawn from three conceptually
|
||
distinct alphabets: either literal bytes, from the alphabet of
|
||
byte values (0..255), or <length, backward distance> pairs,
|
||
where the length is drawn from (3..258) and the distance is
|
||
drawn from (1..32,768). In fact, the literal and length
|
||
alphabets are merged into a single alphabet (0..285), where
|
||
values 0..255 represent literal bytes, the value 256 indicates
|
||
end-of-block, and values 257..285 represent length codes
|
||
(possibly in conjunction with extra bits following the symbol
|
||
code) as follows:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 11]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
Extra Extra Extra
|
||
Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
|
||
---- ---- ------ ---- ---- ------- ---- ---- -------
|
||
257 0 3 267 1 15,16 277 4 67-82
|
||
258 0 4 268 1 17,18 278 4 83-98
|
||
259 0 5 269 2 19-22 279 4 99-114
|
||
260 0 6 270 2 23-26 280 4 115-130
|
||
261 0 7 271 2 27-30 281 5 131-162
|
||
262 0 8 272 2 31-34 282 5 163-194
|
||
263 0 9 273 3 35-42 283 5 195-226
|
||
264 0 10 274 3 43-50 284 5 227-257
|
||
265 1 11,12 275 3 51-58 285 0 258
|
||
266 1 13,14 276 3 59-66
|
||
|
||
The extra bits should be interpreted as a machine integer
|
||
stored with the most-significant bit first, e.g., bits 1110
|
||
represent the value 14.
|
||
|
||
Extra Extra Extra
|
||
Code Bits Dist Code Bits Dist Code Bits Distance
|
||
---- ---- ---- ---- ---- ------ ---- ---- --------
|
||
0 0 1 10 4 33-48 20 9 1025-1536
|
||
1 0 2 11 4 49-64 21 9 1537-2048
|
||
2 0 3 12 5 65-96 22 10 2049-3072
|
||
3 0 4 13 5 97-128 23 10 3073-4096
|
||
4 1 5,6 14 6 129-192 24 11 4097-6144
|
||
5 1 7,8 15 6 193-256 25 11 6145-8192
|
||
6 2 9-12 16 7 257-384 26 12 8193-12288
|
||
7 2 13-16 17 7 385-512 27 12 12289-16384
|
||
8 3 17-24 18 8 513-768 28 13 16385-24576
|
||
9 3 25-32 19 8 769-1024 29 13 24577-32768
|
||
|
||
3.2.6. Compression with fixed Huffman codes (BTYPE=01)
|
||
|
||
The Huffman codes for the two alphabets are fixed, and are not
|
||
represented explicitly in the data. The Huffman code lengths
|
||
for the literal/length alphabet are:
|
||
|
||
Lit Value Bits Codes
|
||
--------- ---- -----
|
||
0 - 143 8 00110000 through
|
||
10111111
|
||
144 - 255 9 110010000 through
|
||
111111111
|
||
256 - 279 7 0000000 through
|
||
0010111
|
||
280 - 287 8 11000000 through
|
||
11000111
|
||
|
||
|
||
|
||
Deutsch Informational [Page 12]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
The code lengths are sufficient to generate the actual codes,
|
||
as described above; we show the codes in the table for added
|
||
clarity. Literal/length values 286-287 will never actually
|
||
occur in the compressed data, but participate in the code
|
||
construction.
|
||
|
||
Distance codes 0-31 are represented by (fixed-length) 5-bit
|
||
codes, with possible additional bits as shown in the table
|
||
shown in Paragraph 3.2.5, above. Note that distance codes 30-
|
||
31 will never actually occur in the compressed data.
|
||
|
||
3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
|
||
|
||
The Huffman codes for the two alphabets appear in the block
|
||
immediately after the header bits and before the actual
|
||
compressed data, first the literal/length code and then the
|
||
distance code. Each code is defined by a sequence of code
|
||
lengths, as discussed in Paragraph 3.2.2, above. For even
|
||
greater compactness, the code length sequences themselves are
|
||
compressed using a Huffman code. The alphabet for code lengths
|
||
is as follows:
|
||
|
||
0 - 15: Represent code lengths of 0 - 15
|
||
16: Copy the previous code length 3 - 6 times.
|
||
The next 2 bits indicate repeat length
|
||
(0 = 3, ... , 3 = 6)
|
||
Example: Codes 8, 16 (+2 bits 11),
|
||
16 (+2 bits 10) will expand to
|
||
12 code lengths of 8 (1 + 6 + 5)
|
||
17: Repeat a code length of 0 for 3 - 10 times.
|
||
(3 bits of length)
|
||
18: Repeat a code length of 0 for 11 - 138 times
|
||
(7 bits of length)
|
||
|
||
A code length of 0 indicates that the corresponding symbol in
|
||
the literal/length or distance alphabet will not occur in the
|
||
block, and should not participate in the Huffman code
|
||
construction algorithm given earlier. If only one distance
|
||
code is used, it is encoded using one bit, not zero bits; in
|
||
this case there is a single code length of one, with one unused
|
||
code. One distance code of zero bits means that there are no
|
||
distance codes used at all (the data is all literals).
|
||
|
||
We can now define the format of the block:
|
||
|
||
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
|
||
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
|
||
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
|
||
|
||
|
||
|
||
Deutsch Informational [Page 13]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
(HCLEN + 4) x 3 bits: code lengths for the code length
|
||
alphabet given just above, in the order: 16, 17, 18,
|
||
0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
|
||
|
||
These code lengths are interpreted as 3-bit integers
|
||
(0-7); as above, a code length of 0 means the
|
||
corresponding symbol (literal/length or distance code
|
||
length) is not used.
|
||
|
||
HLIT + 257 code lengths for the literal/length alphabet,
|
||
encoded using the code length Huffman code
|
||
|
||
HDIST + 1 code lengths for the distance alphabet,
|
||
encoded using the code length Huffman code
|
||
|
||
The actual compressed data of the block,
|
||
encoded using the literal/length and distance Huffman
|
||
codes
|
||
|
||
The literal/length symbol 256 (end of data),
|
||
encoded using the literal/length Huffman code
|
||
|
||
The code length repeat codes can cross from HLIT + 257 to the
|
||
HDIST + 1 code lengths. In other words, all code lengths form
|
||
a single sequence of HLIT + HDIST + 258 values.
|
||
|
||
3.3. Compliance
|
||
|
||
A compressor may limit further the ranges of values specified in
|
||
the previous section and still be compliant; for example, it may
|
||
limit the range of backward pointers to some value smaller than
|
||
32K. Similarly, a compressor may limit the size of blocks so that
|
||
a compressible block fits in memory.
|
||
|
||
A compliant decompressor must accept the full range of possible
|
||
values defined in the previous section, and must accept blocks of
|
||
arbitrary size.
|
||
|
||
4. Compression algorithm details
|
||
|
||
While it is the intent of this document to define the "deflate"
|
||
compressed data format without reference to any particular
|
||
compression algorithm, the format is related to the compressed
|
||
formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
|
||
since many variations of LZ77 are patented, it is strongly
|
||
recommended that the implementor of a compressor follow the general
|
||
algorithm presented here, which is known not to be patented per se.
|
||
The material in this section is not part of the definition of the
|
||
|
||
|
||
|
||
Deutsch Informational [Page 14]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
specification per se, and a compressor need not follow it in order to
|
||
be compliant.
|
||
|
||
The compressor terminates a block when it determines that starting a
|
||
new block with fresh trees would be useful, or when the block size
|
||
fills up the compressor's block buffer.
|
||
|
||
The compressor uses a chained hash table to find duplicated strings,
|
||
using a hash function that operates on 3-byte sequences. At any
|
||
given point during compression, let XYZ be the next 3 input bytes to
|
||
be examined (not necessarily all different, of course). First, the
|
||
compressor examines the hash chain for XYZ. If the chain is empty,
|
||
the compressor simply writes out X as a literal byte and advances one
|
||
byte in the input. If the hash chain is not empty, indicating that
|
||
the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
|
||
same hash function value) has occurred recently, the compressor
|
||
compares all strings on the XYZ hash chain with the actual input data
|
||
sequence starting at the current point, and selects the longest
|
||
match.
|
||
|
||
The compressor searches the hash chains starting with the most recent
|
||
strings, to favor small distances and thus take advantage of the
|
||
Huffman encoding. The hash chains are singly linked. There are no
|
||
deletions from the hash chains; the algorithm simply discards matches
|
||
that are too old. To avoid a worst-case situation, very long hash
|
||
chains are arbitrarily truncated at a certain length, determined by a
|
||
run-time parameter.
|
||
|
||
To improve overall compression, the compressor optionally defers the
|
||
selection of matches ("lazy matching"): after a match of length N has
|
||
been found, the compressor searches for a longer match starting at
|
||
the next input byte. If it finds a longer match, it truncates the
|
||
previous match to a length of one (thus producing a single literal
|
||
byte) and then emits the longer match. Otherwise, it emits the
|
||
original match, and, as described above, advances N bytes before
|
||
continuing.
|
||
|
||
Run-time parameters also control this "lazy match" procedure. If
|
||
compression ratio is most important, the compressor attempts a
|
||
complete second search regardless of the length of the first match.
|
||
In the normal case, if the current match is "long enough", the
|
||
compressor reduces the search for a longer match, thus speeding up
|
||
the process. If speed is most important, the compressor inserts new
|
||
strings in the hash table only when no match was found, or when the
|
||
match is not "too long". This degrades the compression ratio but
|
||
saves time since there are both fewer insertions and fewer searches.
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 15]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
5. References
|
||
|
||
[1] Huffman, D. A., "A Method for the Construction of Minimum
|
||
Redundancy Codes", Proceedings of the Institute of Radio
|
||
Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
|
||
|
||
[2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
|
||
Compression", IEEE Transactions on Information Theory, Vol. 23,
|
||
No. 3, pp. 337-343.
|
||
|
||
[3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
|
||
available in ftp://ftp.uu.net/pub/archiving/zip/doc/
|
||
|
||
[4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
|
||
available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
|
||
|
||
[5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
|
||
encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
|
||
|
||
[6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
|
||
Comm. ACM, 33,4, April 1990, pp. 449-459.
|
||
|
||
6. Security Considerations
|
||
|
||
Any data compression method involves the reduction of redundancy in
|
||
the data. Consequently, any corruption of the data is likely to have
|
||
severe effects and be difficult to correct. Uncompressed text, on
|
||
the other hand, will probably still be readable despite the presence
|
||
of some corrupted bytes.
|
||
|
||
It is recommended that systems using this data format provide some
|
||
means of validating the integrity of the compressed data. See
|
||
reference [3], for example.
|
||
|
||
7. Source code
|
||
|
||
Source code for a C language implementation of a "deflate" compliant
|
||
compressor and decompressor is available within the zlib package at
|
||
ftp://ftp.uu.net/pub/archiving/zip/zlib/.
|
||
|
||
8. Acknowledgements
|
||
|
||
Trademarks cited in this document are the property of their
|
||
respective owners.
|
||
|
||
Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
|
||
Adler wrote the related software described in this specification.
|
||
Glenn Randers-Pehrson converted this document to RFC and HTML format.
|
||
|
||
|
||
|
||
Deutsch Informational [Page 16]
|
||
|
||
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
|
||
|
||
|
||
9. Author's Address
|
||
|
||
L. Peter Deutsch
|
||
Aladdin Enterprises
|
||
203 Santa Margarita Ave.
|
||
Menlo Park, CA 94025
|
||
|
||
Phone: (415) 322-0103 (AM only)
|
||
FAX: (415) 322-1734
|
||
EMail: <ghost@aladdin.com>
|
||
|
||
Questions about the technical content of this specification can be
|
||
sent by email to:
|
||
|
||
Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
|
||
Mark Adler <madler@alumni.caltech.edu>
|
||
|
||
Editorial comments on this specification can be sent by email to:
|
||
|
||
L. Peter Deutsch <ghost@aladdin.com> and
|
||
Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Deutsch Informational [Page 17]
|
||
|