NetBSD/common/dist/zlib/doc/rfc1951.txt
christos c34236556b Changes in 1.2.10 (2 Jan 2017)
- Avoid warnings on snprintf() return value
- Fix bug in deflate_stored() for zero-length input
- Fix bug in gzwrite.c that produced corrupt gzip files
- Remove files to be installed before copying them in Makefile.in
- Add warnings when compiling with assembler code

Changes in 1.2.9 (31 Dec 2016)
- Fix contrib/minizip to permit unzipping with desktop API [Zouzou]
- Improve contrib/blast to return unused bytes
- Assure that gzoffset() is correct when appending
- Improve compress() and uncompress() to support large lengths
- Fix bug in test/example.c where error code not saved
- Remedy Coverity warning [Randers-Pehrson]
- Improve speed of gzprintf() in transparent mode
- Fix inflateInit2() bug when windowBits is 16 or 32
- Change DEBUG macro to ZLIB_DEBUG
- Avoid uninitialized access by gzclose_w()
- Allow building zlib outside of the source directory
- Fix bug that accepted invalid zlib header when windowBits is zero
- Fix gzseek() problem on MinGW due to buggy _lseeki64 there
- Loop on write() calls in gzwrite.c in case of non-blocking I/O
- Add --warn (-w) option to ./configure for more compiler warnings
- Reject a window size of 256 bytes if not using the zlib wrapper
- Fix bug when level 0 used with Z_HUFFMAN or Z_RLE
- Add --debug (-d) option to ./configure to define ZLIB_DEBUG
- Fix bugs in creating a very large gzip header
- Add uncompress2() function, which returns the input size used
- Assure that deflateParams() will not switch functions mid-block
- Dramatically speed up deflation for level 0 (storing)
- Add gzfread(), duplicating the interface of fread()
- Add gzfwrite(), duplicating the interface of fwrite()
- Add deflateGetDictionary() function
- Use snprintf() for later versions of Microsoft C
- Fix *Init macros to use z_ prefix when requested
- Replace as400 with os400 for OS/400 support [Monnerat]
- Add crc32_z() and adler32_z() functions with size_t lengths
- Update Visual Studio project files [AraHaan]

Changes in 1.2.8 (28 Apr 2013)
- Update contrib/minizip/iowin32.c for Windows RT [Vollant]
- Do not force Z_CONST for C++
- Clean up contrib/vstudio [Roß]
- Correct spelling error in zlib.h
- Fix mixed line endings in contrib/vstudio

Changes in 1.2.7.3 (13 Apr 2013)
- Fix version numbers and DLL names in contrib/vstudio/*/zlib.rc

Changes in 1.2.7.2 (13 Apr 2013)
- Change check for a four-byte type back to hexadecimal
- Fix typo in win32/Makefile.msc
- Add casts in gzwrite.c for pointer differences

Changes in 1.2.7.1 (24 Mar 2013)
- Replace use of unsafe string functions with snprintf if available
- Avoid including stddef.h on Windows for Z_SOLO compile [Niessink]
- Fix gzgetc undefine when Z_PREFIX set [Turk]
- Eliminate use of mktemp in Makefile (not always available)
- Fix bug in 'F' mode for gzopen()
- Add inflateGetDictionary() function
- Correct comment in deflate.h
- Use _snprintf for snprintf in Microsoft C
- On Darwin, only use /usr/bin/libtool if libtool is not Apple
- Delete "--version" file if created by "ar --version" [Richard G.]
- Fix configure check for veracity of compiler error return codes
- Fix CMake compilation of static lib for MSVC2010 x64
- Remove unused variable in infback9.c
- Fix argument checks in gzlog_compress() and gzlog_write()
- Clean up the usage of z_const and respect const usage within zlib
- Clean up examples/gzlog.[ch] comparisons of different types
- Avoid shift equal to bits in type (caused endless loop)
- Fix uninitialized value bug in gzputc() introduced by const patches
- Fix memory allocation error in examples/zran.c [Nor]
- Fix bug where gzopen(), gzclose() would write an empty file
- Fix bug in gzclose() when gzwrite() runs out of memory
- Check for input buffer malloc failure in examples/gzappend.c
- Add note to contrib/blast to use binary mode in stdio
- Fix comparisons of differently signed integers in contrib/blast
- Check for invalid code length codes in contrib/puff
- Fix serious but very rare decompression bug in inftrees.c
- Update inflateBack() comments, since inflate() can be faster
- Use underscored I/O function names for WINAPI_FAMILY
- Add _tr_flush_bits to the external symbols prefixed by --zprefix
- Add contrib/vstudio/vc10 pre-build step for static only
- Quote --version-script argument in CMakeLists.txt
- Don't specify --version-script on Apple platforms in CMakeLists.txt
- Fix casting error in contrib/testzlib/testzlib.c
- Fix types in contrib/minizip to match result of get_crc_table()
- Simplify contrib/vstudio/vc10 with 'd' suffix
- Add TOP support to win32/Makefile.msc
- Suport i686 and amd64 assembler builds in CMakeLists.txt
- Fix typos in the use of _LARGEFILE64_SOURCE in zconf.h
- Add vc11 and vc12 build files to contrib/vstudio
- Add gzvprintf() as an undocumented function in zlib
- Fix configure for Sun shell
- Remove runtime check in configure for four-byte integer type
- Add casts and consts to ease user conversion to C++
- Add man pages for minizip and miniunzip
- In Makefile uninstall, don't rm if preceding cd fails
- Do not return Z_BUF_ERROR if deflateParam() has nothing to write

Changes in 1.2.7 (2 May 2012)
- Replace use of memmove() with a simple copy for portability
- Test for existence of strerror
- Restore gzgetc_ for backward compatibility with 1.2.6
- Fix build with non-GNU make on Solaris
- Require gcc 4.0 or later on Mac OS X to use the hidden attribute
- Include unistd.h for Watcom C
- Use __WATCOMC__ instead of __WATCOM__
- Do not use the visibility attribute if NO_VIZ defined
- Improve the detection of no hidden visibility attribute
- Avoid using __int64 for gcc or solo compilation
- Cast to char * in gzprintf to avoid warnings [Zinser]
- Fix make_vms.com for VAX [Zinser]
- Don't use library or built-in byte swaps
- Simplify test and use of gcc hidden attribute
- Fix bug in gzclose_w() when gzwrite() fails to allocate memory
- Add "x" (O_EXCL) and "e" (O_CLOEXEC) modes support to gzopen()
- Fix bug in test/minigzip.c for configure --solo
- Fix contrib/vstudio project link errors [Mohanathas]
- Add ability to choose the builder in make_vms.com [Schweda]
- Add DESTDIR support to mingw32 win32/Makefile.gcc
- Fix comments in win32/Makefile.gcc for proper usage
- Allow overriding the default install locations for cmake
- Generate and install the pkg-config file with cmake
- Build both a static and a shared version of zlib with cmake
- Include version symbols for cmake builds
- If using cmake with MSVC, add the source directory to the includes
- Remove unneeded EXTRA_CFLAGS from win32/Makefile.gcc [Truta]
- Move obsolete emx makefile to old [Truta]
- Allow the use of -Wundef when compiling or using zlib
- Avoid the use of the -u option with mktemp
- Improve inflate() documentation on the use of Z_FINISH
- Recognize clang as gcc
- Add gzopen_w() in Windows for wide character path names
- Rename zconf.h in CMakeLists.txt to move it out of the way
- Add source directory in CMakeLists.txt for building examples
- Look in build directory for zlib.pc in CMakeLists.txt
- Remove gzflags from zlibvc.def in vc9 and vc10
- Fix contrib/minizip compilation in the MinGW environment
- Update ./configure for Solaris, support --64 [Mooney]
- Remove -R. from Solaris shared build (possible security issue)
- Avoid race condition for parallel make (-j) running example
- Fix type mismatch between get_crc_table() and crc_table
- Fix parsing of version with "-" in CMakeLists.txt [Snider, Ziegler]
- Fix the path to zlib.map in CMakeLists.txt
- Force the native libtool in Mac OS X to avoid GNU libtool [Beebe]
- Add instructions to win32/Makefile.gcc for shared install [Torri]

Changes in 1.2.6.1 (12 Feb 2012)
- Avoid the use of the Objective-C reserved name "id"
- Include io.h in gzguts.h for Microsoft compilers
- Fix problem with ./configure --prefix and gzgetc macro
- Include gz_header definition when compiling zlib solo
- Put gzflags() functionality back in zutil.c
- Avoid library header include in crc32.c for Z_SOLO
- Use name in GCC_CLASSIC as C compiler for coverage testing, if set
- Minor cleanup in contrib/minizip/zip.c [Vollant]
- Update make_vms.com [Zinser]
- Remove unnecessary gzgetc_ function
- Use optimized byte swap operations for Microsoft and GNU [Snyder]
- Fix minor typo in zlib.h comments [Rzesniowiecki]

Changes in 1.2.6 (29 Jan 2012)
- Update the Pascal interface in contrib/pascal
- Fix function numbers for gzgetc_ in zlibvc.def files
- Fix configure.ac for contrib/minizip [Schiffer]
- Fix large-entry detection in minizip on 64-bit systems [Schiffer]
- Have ./configure use the compiler return code for error indication
- Fix CMakeLists.txt for cross compilation [McClure]
- Fix contrib/minizip/zip.c for 64-bit architectures [Dalsnes]
- Fix compilation of contrib/minizip on FreeBSD [Marquez]
- Correct suggested usages in win32/Makefile.msc [Shachar, Horvath]
- Include io.h for Turbo C / Borland C on all platforms [Truta]
- Make version explicit in contrib/minizip/configure.ac [Bosmans]
- Avoid warning for no encryption in contrib/minizip/zip.c [Vollant]
- Minor cleanup up contrib/minizip/unzip.c [Vollant]
- Fix bug when compiling minizip with C++ [Vollant]
- Protect for long name and extra fields in contrib/minizip [Vollant]
- Avoid some warnings in contrib/minizip [Vollant]
- Add -I../.. -L../.. to CFLAGS for minizip and miniunzip
- Add missing libs to minizip linker command
- Add support for VPATH builds in contrib/minizip
- Add an --enable-demos option to contrib/minizip/configure
- Add the generation of configure.log by ./configure
- Exit when required parameters not provided to win32/Makefile.gcc
- Have gzputc return the character written instead of the argument
- Use the -m option on ldconfig for BSD systems [Tobias]
- Correct in zlib.map when deflateResetKeep was added

Changes in 1.2.5.3 (15 Jan 2012)
- Restore gzgetc function for binary compatibility
- Do not use _lseeki64 under Borland C++ [Truta]
- Update win32/Makefile.msc to build test/*.c [Truta]
- Remove old/visualc6 given CMakefile and other alternatives
- Update AS400 build files and documentation [Monnerat]
- Update win32/Makefile.gcc to build test/*.c [Truta]
- Permit stronger flushes after Z_BLOCK flushes
- Avoid extraneous empty blocks when doing empty flushes
- Permit Z_NULL arguments to deflatePending
- Allow deflatePrime() to insert bits in the middle of a stream
- Remove second empty static block for Z_PARTIAL_FLUSH
- Write out all of the available bits when using Z_BLOCK
- Insert the first two strings in the hash table after a flush

Changes in 1.2.5.2 (17 Dec 2011)
- fix ld error: unable to find version dependency 'ZLIB_1.2.5'
- use relative symlinks for shared libs
- Avoid searching past window for Z_RLE strategy
- Assure that high-water mark initialization is always applied in deflate
- Add assertions to fill_window() in deflate.c to match comments
- Update python link in README
- Correct spelling error in gzread.c
- Fix bug in gzgets() for a concatenated empty gzip stream
- Correct error in comment for gz_make()
- Change gzread() and related to ignore junk after gzip streams
- Allow gzread() and related to continue after gzclearerr()
- Allow gzrewind() and gzseek() after a premature end-of-file
- Simplify gzseek() now that raw after gzip is ignored
- Change gzgetc() to a macro for speed (~40% speedup in testing)
- Fix gzclose() to return the actual error last encountered
- Always add large file support for windows
- Include zconf.h for windows large file support
- Include zconf.h.cmakein for windows large file support
- Update zconf.h.cmakein on make distclean
- Merge vestigial vsnprintf determination from zutil.h to gzguts.h
- Clarify how gzopen() appends in zlib.h comments
- Correct documentation of gzdirect() since junk at end now ignored
- Add a transparent write mode to gzopen() when 'T' is in the mode
- Update python link in zlib man page
- Get inffixed.h and MAKEFIXED result to match
- Add a ./config --solo option to make zlib subset with no library use
- Add undocumented inflateResetKeep() function for CAB file decoding
- Add --cover option to ./configure for gcc coverage testing
- Add #define ZLIB_CONST option to use const in the z_stream interface
- Add comment to gzdopen() in zlib.h to use dup() when using fileno()
- Note behavior of uncompress() to provide as much data as it can
- Add files in contrib/minizip to aid in building libminizip
- Split off AR options in Makefile.in and configure
- Change ON macro to Z_ARG to avoid application conflicts
- Facilitate compilation with Borland C++ for pragmas and vsnprintf
- Include io.h for Turbo C / Borland C++
- Move example.c and minigzip.c to test/
- Simplify incomplete code table filling in inflate_table()
- Remove code from inflate.c and infback.c that is impossible to execute
- Test the inflate code with full coverage
- Allow deflateSetDictionary, inflateSetDictionary at any time (in raw)
- Add deflateResetKeep and fix inflateResetKeep to retain dictionary
- Fix gzwrite.c to accommodate reduced memory zlib compilation
- Have inflate() with Z_FINISH avoid the allocation of a window
- Do not set strm->adler when doing raw inflate
- Fix gzeof() to behave just like feof() when read is not past end of file
- Fix bug in gzread.c when end-of-file is reached
- Avoid use of Z_BUF_ERROR in gz* functions except for premature EOF
- Document gzread() capability to read concurrently written files
- Remove hard-coding of resource compiler in CMakeLists.txt [Blammo]

Changes in 1.2.5.1 (10 Sep 2011)
- Update FAQ entry on shared builds (#13)
- Avoid symbolic argument to chmod in Makefile.in
- Fix bug and add consts in contrib/puff [Oberhumer]
- Update contrib/puff/zeros.raw test file to have all block types
- Add full coverage test for puff in contrib/puff/Makefile
- Fix static-only-build install in Makefile.in
- Fix bug in unzGetCurrentFileInfo() in contrib/minizip [Kuno]
- Add libz.a dependency to shared in Makefile.in for parallel builds
- Spell out "number" (instead of "nb") in zlib.h for total_in, total_out
- Replace $(...) with `...` in configure for non-bash sh [Bowler]
- Add darwin* to Darwin* and solaris* to SunOS\ 5* in configure [Groffen]
- Add solaris* to Linux* in configure to allow gcc use [Groffen]
- Add *bsd* to Linux* case in configure [Bar-Lev]
- Add inffast.obj to dependencies in win32/Makefile.msc
- Correct spelling error in deflate.h [Kohler]
- Change libzdll.a again to libz.dll.a (!) in win32/Makefile.gcc
- Add test to configure for GNU C looking for gcc in output of $cc -v
- Add zlib.pc generation to win32/Makefile.gcc [Weigelt]
- Fix bug in zlib.h for _FILE_OFFSET_BITS set and _LARGEFILE64_SOURCE not
- Add comment in zlib.h that adler32_combine with len2 < 0 makes no sense
- Make NO_DIVIDE option in adler32.c much faster (thanks to John Reiser)
- Make stronger test in zconf.h to include unistd.h for LFS
- Apply Darwin patches for 64-bit file offsets to contrib/minizip [Slack]
- Fix zlib.h LFS support when Z_PREFIX used
- Add updated as400 support (removed from old) [Monnerat]
- Avoid deflate sensitivity to volatile input data
- Avoid division in adler32_combine for NO_DIVIDE
- Clarify the use of Z_FINISH with deflateBound() amount of space
- Set binary for output file in puff.c
- Use u4 type for crc_table to avoid conversion warnings
- Apply casts in zlib.h to avoid conversion warnings
- Add OF to prototypes for adler32_combine_ and crc32_combine_ [Miller]
- Improve inflateSync() documentation to note indeterminancy
- Add deflatePending() function to return the amount of pending output
- Correct the spelling of "specification" in FAQ [Randers-Pehrson]
- Add a check in configure for stdarg.h, use for gzprintf()
- Check that pointers fit in ints when gzprint() compiled old style
- Add dummy name before $(SHAREDLIBV) in Makefile [Bar-Lev, Bowler]
- Delete line in configure that adds -L. libz.a to LDFLAGS [Weigelt]
- Add debug records in assmebler code [Londer]
- Update RFC references to use http://tools.ietf.org/html/... [Li]
- Add --archs option, use of libtool to configure for Mac OS X [Borstel]

Changes in 1.2.5 (19 Apr 2010)
- Disable visibility attribute in win32/Makefile.gcc [Bar-Lev]
- Default to libdir as sharedlibdir in configure [Nieder]
- Update copyright dates on modified source files
- Update trees.c to be able to generate modified trees.h
- Exit configure for MinGW, suggesting win32/Makefile.gcc
- Check for NULL path in gz_open [Homurlu]

Changes in 1.2.4.5 (18 Apr 2010)
- Set sharedlibdir in configure [Torok]
- Set LDFLAGS in Makefile.in [Bar-Lev]
- Avoid mkdir objs race condition in Makefile.in [Bowler]
- Add ZLIB_INTERNAL in front of internal inter-module functions and arrays
- Define ZLIB_INTERNAL to hide internal functions and arrays for GNU C
- Don't use hidden attribute when it is a warning generator (e.g. Solaris)

Changes in 1.2.4.4 (18 Apr 2010)
- Fix CROSS_PREFIX executable testing, CHOST extract, mingw* [Torok]
- Undefine _LARGEFILE64_SOURCE in zconf.h if it is zero, but not if empty
- Try to use bash or ksh regardless of functionality of /bin/sh
- Fix configure incompatibility with NetBSD sh
- Remove attempt to run under bash or ksh since have better NetBSD fix
- Fix win32/Makefile.gcc for MinGW [Bar-Lev]
- Add diagnostic messages when using CROSS_PREFIX in configure
- Added --sharedlibdir option to configure [Weigelt]
- Use hidden visibility attribute when available [Frysinger]

Changes in 1.2.4.3 (10 Apr 2010)
- Only use CROSS_PREFIX in configure for ar and ranlib if they exist
- Use CROSS_PREFIX for nm [Bar-Lev]
- Assume _LARGEFILE64_SOURCE defined is equivalent to true
- Avoid use of undefined symbols in #if with && and ||
- Make *64 prototypes in gzguts.h consistent with functions
- Add -shared load option for MinGW in configure [Bowler]
- Move z_off64_t to public interface, use instead of off64_t
- Remove ! from shell test in configure (not portable to Solaris)
- Change +0 macro tests to -0 for possibly increased portability

Changes in 1.2.4.2 (9 Apr 2010)
- Add consistent carriage returns to readme.txt's in masmx86 and masmx64
- Really provide prototypes for *64 functions when building without LFS
- Only define unlink() in minigzip.c if unistd.h not included
- Update README to point to contrib/vstudio project files
- Move projects/vc6 to old/ and remove projects/
- Include stdlib.h in minigzip.c for setmode() definition under WinCE
- Clean up assembler builds in win32/Makefile.msc [Rowe]
- Include sys/types.h for Microsoft for off_t definition
- Fix memory leak on error in gz_open()
- Symbolize nm as $NM in configure [Weigelt]
- Use TEST_LDSHARED instead of LDSHARED to link test programs [Weigelt]
- Add +0 to _FILE_OFFSET_BITS and _LFS64_LARGEFILE in case not defined
- Fix bug in gzeof() to take into account unused input data
- Avoid initialization of structures with variables in puff.c
- Updated win32/README-WIN32.txt [Rowe]

Changes in 1.2.4.1 (28 Mar 2010)
- Remove the use of [a-z] constructs for sed in configure [gentoo 310225]
- Remove $(SHAREDLIB) from LIBS in Makefile.in [Creech]
- Restore "for debugging" comment on sprintf() in gzlib.c
- Remove fdopen for MVS from gzguts.h
- Put new README-WIN32.txt in win32 [Rowe]
- Add check for shell to configure and invoke another shell if needed
- Fix big fat stinking bug in gzseek() on uncompressed files
- Remove vestigial F_OPEN64 define in zutil.h
- Set and check the value of _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE
- Avoid errors on non-LFS systems when applications define LFS macros
- Set EXE to ".exe" in configure for MINGW [Kahle]
- Match crc32() in crc32.c exactly to the prototype in zlib.h [Sherrill]
- Add prefix for cross-compilation in win32/makefile.gcc [Bar-Lev]
- Add DLL install in win32/makefile.gcc [Bar-Lev]
- Allow Linux* or linux* from uname in configure [Bar-Lev]
- Allow ldconfig to be redefined in configure and Makefile.in [Bar-Lev]
- Add cross-compilation prefixes to configure [Bar-Lev]
- Match type exactly in gz_load() invocation in gzread.c
- Match type exactly of zcalloc() in zutil.c to zlib.h alloc_func
- Provide prototypes for *64 functions when building zlib without LFS
- Don't use -lc when linking shared library on MinGW
- Remove errno.h check in configure and vestigial errno code in zutil.h

Changes in 1.2.4 (14 Mar 2010)
- Fix VER3 extraction in configure for no fourth subversion
- Update zlib.3, add docs to Makefile.in to make .pdf out of it
- Add zlib.3.pdf to distribution
- Don't set error code in gzerror() if passed pointer is NULL
- Apply destination directory fixes to CMakeLists.txt [Lowman]
- Move #cmakedefine's to a new zconf.in.cmakein
- Restore zconf.h for builds that don't use configure or cmake
- Add distclean to dummy Makefile for convenience
- Update and improve INDEX, README, and FAQ
- Update CMakeLists.txt for the return of zconf.h [Lowman]
- Update contrib/vstudio/vc9 and vc10 [Vollant]
- Change libz.dll.a back to libzdll.a in win32/Makefile.gcc
- Apply license and readme changes to contrib/asm686 [Raiter]
- Check file name lengths and add -c option in minigzip.c [Li]
- Update contrib/amd64 and contrib/masmx86/ [Vollant]
- Avoid use of "eof" parameter in trees.c to not shadow library variable
- Update make_vms.com for removal of zlibdefs.h [Zinser]
- Update assembler code and vstudio projects in contrib [Vollant]
- Remove outdated assembler code contrib/masm686 and contrib/asm586
- Remove old vc7 and vc8 from contrib/vstudio
- Update win32/Makefile.msc, add ZLIB_VER_SUBREVISION [Rowe]
- Fix memory leaks in gzclose_r() and gzclose_w(), file leak in gz_open()
- Add contrib/gcc_gvmat64 for longest_match and inflate_fast [Vollant]
- Remove *64 functions from win32/zlib.def (they're not 64-bit yet)
- Fix bug in void-returning vsprintf() case in gzwrite.c
- Fix name change from inflate.h in contrib/inflate86/inffas86.c
- Check if temporary file exists before removing in make_vms.com [Zinser]
- Fix make install and uninstall for --static option
- Fix usage of _MSC_VER in gzguts.h and zutil.h [Truta]
- Update readme.txt in contrib/masmx64 and masmx86 to assemble

Changes in 1.2.3.9 (21 Feb 2010)
- Expunge gzio.c
- Move as400 build information to old
- Fix updates in contrib/minizip and contrib/vstudio
- Add const to vsnprintf test in configure to avoid warnings [Weigelt]
- Delete zconf.h (made by configure) [Weigelt]
- Change zconf.in.h to zconf.h.in per convention [Weigelt]
- Check for NULL buf in gzgets()
- Return empty string for gzgets() with len == 1 (like fgets())
- Fix description of gzgets() in zlib.h for end-of-file, NULL return
- Update minizip to 1.1 [Vollant]
- Avoid MSVC loss of data warnings in gzread.c, gzwrite.c
- Note in zlib.h that gzerror() should be used to distinguish from EOF
- Remove use of snprintf() from gzlib.c
- Fix bug in gzseek()
- Update contrib/vstudio, adding vc9 and vc10 [Kuno, Vollant]
- Fix zconf.h generation in CMakeLists.txt [Lowman]
- Improve comments in zconf.h where modified by configure

Changes in 1.2.3.8 (13 Feb 2010)
- Clean up text files (tabs, trailing whitespace, etc.) [Oberhumer]
- Use z_off64_t in gz_zero() and gz_skip() to match state->skip
- Avoid comparison problem when sizeof(int) == sizeof(z_off64_t)
- Revert to Makefile.in from 1.2.3.6 (live with the clutter)
- Fix missing error return in gzflush(), add zlib.h note
- Add *64 functions to zlib.map [Levin]
- Fix signed/unsigned comparison in gz_comp()
- Use SFLAGS when testing shared linking in configure
- Add --64 option to ./configure to use -m64 with gcc
- Fix ./configure --help to correctly name options
- Have make fail if a test fails [Levin]
- Avoid buffer overrun in contrib/masmx64/gvmat64.asm [Simpson]
- Remove assembler object files from contrib

Changes in 1.2.3.7 (24 Jan 2010)
- Always gzopen() with O_LARGEFILE if available
- Fix gzdirect() to work immediately after gzopen() or gzdopen()
- Make gzdirect() more precise when the state changes while reading
- Improve zlib.h documentation in many places
- Catch memory allocation failure in gz_open()
- Complete close operation if seek forward in gzclose_w() fails
- Return Z_ERRNO from gzclose_r() if close() fails
- Return Z_STREAM_ERROR instead of EOF for gzclose() being passed NULL
- Return zero for gzwrite() errors to match zlib.h description
- Return -1 on gzputs() error to match zlib.h description
- Add zconf.in.h to allow recovery from configure modification [Weigelt]
- Fix static library permissions in Makefile.in [Weigelt]
- Avoid warnings in configure tests that hide functionality [Weigelt]
- Add *BSD and DragonFly to Linux case in configure [gentoo 123571]
- Change libzdll.a to libz.dll.a in win32/Makefile.gcc [gentoo 288212]
- Avoid access of uninitialized data for first inflateReset2 call [Gomes]
- Keep object files in subdirectories to reduce the clutter somewhat
- Remove default Makefile and zlibdefs.h, add dummy Makefile
- Add new external functions to Z_PREFIX, remove duplicates, z_z_ -> z_
- Remove zlibdefs.h completely -- modify zconf.h instead

Changes in 1.2.3.6 (17 Jan 2010)
- Avoid void * arithmetic in gzread.c and gzwrite.c
- Make compilers happier with const char * for gz_error message
- Avoid unused parameter warning in inflate.c
- Avoid signed-unsigned comparison warning in inflate.c
- Indent #pragma's for traditional C
- Fix usage of strwinerror() in glib.c, change to gz_strwinerror()
- Correct email address in configure for system options
- Update make_vms.com and add make_vms.com to contrib/minizip [Zinser]
- Update zlib.map [Brown]
- Fix Makefile.in for Solaris 10 make of example64 and minizip64 [Torok]
- Apply various fixes to CMakeLists.txt [Lowman]
- Add checks on len in gzread() and gzwrite()
- Add error message for no more room for gzungetc()
- Remove zlib version check in gzwrite()
- Defer compression of gzprintf() result until need to
- Use snprintf() in gzdopen() if available
- Remove USE_MMAP configuration determination (only used by minigzip)
- Remove examples/pigz.c (available separately)
- Update examples/gun.c to 1.6

Changes in 1.2.3.5 (8 Jan 2010)
- Add space after #if in zutil.h for some compilers
- Fix relatively harmless bug in deflate_fast() [Exarevsky]
- Fix same problem in deflate_slow()
- Add $(SHAREDLIBV) to LIBS in Makefile.in [Brown]
- Add deflate_rle() for faster Z_RLE strategy run-length encoding
- Add deflate_huff() for faster Z_HUFFMAN_ONLY encoding
- Change name of "write" variable in inffast.c to avoid library collisions
- Fix premature EOF from gzread() in gzio.c [Brown]
- Use zlib header window size if windowBits is 0 in inflateInit2()
- Remove compressBound() call in deflate.c to avoid linking compress.o
- Replace use of errno in gz* with functions, support WinCE [Alves]
- Provide alternative to perror() in minigzip.c for WinCE [Alves]
- Don't use _vsnprintf on later versions of MSVC [Lowman]
- Add CMake build script and input file [Lowman]
- Update contrib/minizip to 1.1 [Svensson, Vollant]
- Moved nintendods directory from contrib to .
- Replace gzio.c with a new set of routines with the same functionality
- Add gzbuffer(), gzoffset(), gzclose_r(), gzclose_w() as part of above
- Update contrib/minizip to 1.1b
- Change gzeof() to return 0 on error instead of -1 to agree with zlib.h

Changes in 1.2.3.4 (21 Dec 2009)
- Use old school .SUFFIXES in Makefile.in for FreeBSD compatibility
- Update comments in configure and Makefile.in for default --shared
- Fix test -z's in configure [Marquess]
- Build examplesh and minigzipsh when not testing
- Change NULL's to Z_NULL's in deflate.c and in comments in zlib.h
- Import LDFLAGS from the environment in configure
- Fix configure to populate SFLAGS with discovered CFLAGS options
- Adapt make_vms.com to the new Makefile.in [Zinser]
- Add zlib2ansi script for C++ compilation [Marquess]
- Add _FILE_OFFSET_BITS=64 test to make test (when applicable)
- Add AMD64 assembler code for longest match to contrib [Teterin]
- Include options from $SFLAGS when doing $LDSHARED
- Simplify 64-bit file support by introducing z_off64_t type
- Make shared object files in objs directory to work around old Sun cc
- Use only three-part version number for Darwin shared compiles
- Add rc option to ar in Makefile.in for when ./configure not run
- Add -WI,-rpath,. to LDFLAGS for OSF 1 V4*
- Set LD_LIBRARYN32_PATH for SGI IRIX shared compile
- Protect against _FILE_OFFSET_BITS being defined when compiling zlib
- Rename Makefile.in targets allstatic to static and allshared to shared
- Fix static and shared Makefile.in targets to be independent
- Correct error return bug in gz_open() by setting state [Brown]
- Put spaces before ;;'s in configure for better sh compatibility
- Add pigz.c (parallel implementation of gzip) to examples/
- Correct constant in crc32.c to UL [Leventhal]
- Reject negative lengths in crc32_combine()
- Add inflateReset2() function to work like inflateEnd()/inflateInit2()
- Include sys/types.h for _LARGEFILE64_SOURCE [Brown]
- Correct typo in doc/algorithm.txt [Janik]
- Fix bug in adler32_combine() [Zhu]
- Catch missing-end-of-block-code error in all inflates and in puff
    Assures that random input to inflate eventually results in an error
- Added enough.c (calculation of ENOUGH for inftrees.h) to examples/
- Update ENOUGH and its usage to reflect discovered bounds
- Fix gzerror() error report on empty input file [Brown]
- Add ush casts in trees.c to avoid pedantic runtime errors
- Fix typo in zlib.h uncompress() description [Reiss]
- Correct inflate() comments with regard to automatic header detection
- Remove deprecation comment on Z_PARTIAL_FLUSH (it stays)
- Put new version of gzlog (2.0) in examples with interruption recovery
- Add puff compile option to permit invalid distance-too-far streams
- Add puff TEST command options, ability to read piped input
- Prototype the *64 functions in zlib.h when _FILE_OFFSET_BITS == 64, but
  _LARGEFILE64_SOURCE not defined
- Fix Z_FULL_FLUSH to truly erase the past by resetting s->strstart
- Fix deflateSetDictionary() to use all 32K for output consistency
- Remove extraneous #define MIN_LOOKAHEAD in deflate.c (in deflate.h)
- Clear bytes after deflate lookahead to avoid use of uninitialized data
- Change a limit in inftrees.c to be more transparent to Coverity Prevent
- Update win32/zlib.def with exported symbols from zlib.h
- Correct spelling errors in zlib.h [Willem, Sobrado]
- Allow Z_BLOCK for deflate() to force a new block
- Allow negative bits in inflatePrime() to delete existing bit buffer
- Add Z_TREES flush option to inflate() to return at end of trees
- Add inflateMark() to return current state information for random access
- Add Makefile for NintendoDS to contrib [Costa]
- Add -w in configure compile tests to avoid spurious warnings [Beucler]
- Fix typos in zlib.h comments for deflateSetDictionary()
- Fix EOF detection in transparent gzread() [Maier]

Changes in 1.2.3.3 (2 October 2006)
- Make --shared the default for configure, add a --static option
- Add compile option to permit invalid distance-too-far streams
- Add inflateUndermine() function which is required to enable above
- Remove use of "this" variable name for C++ compatibility [Marquess]
- Add testing of shared library in make test, if shared library built
- Use ftello() and fseeko() if available instead of ftell() and fseek()
- Provide two versions of all functions that use the z_off_t type for
  binary compatibility -- a normal version and a 64-bit offset version,
  per the Large File Support Extension when _LARGEFILE64_SOURCE is
  defined; use the 64-bit versions by default when _FILE_OFFSET_BITS
  is defined to be 64
- Add a --uname= option to configure to perhaps help with cross-compiling

Changes in 1.2.3.2 (3 September 2006)
- Turn off silly Borland warnings [Hay]
- Use off64_t and define _LARGEFILE64_SOURCE when present
- Fix missing dependency on inffixed.h in Makefile.in
- Rig configure --shared to build both shared and static [Teredesai, Truta]
- Remove zconf.in.h and instead create a new zlibdefs.h file
- Fix contrib/minizip/unzip.c non-encrypted after encrypted [Vollant]
- Add treebuild.xml (see http://treebuild.metux.de/) [Weigelt]

Changes in 1.2.3.1 (16 August 2006)
- Add watcom directory with OpenWatcom make files [Daniel]
- Remove #undef of FAR in zconf.in.h for MVS [Fedtke]
- Update make_vms.com [Zinser]
- Use -fPIC for shared build in configure [Teredesai, Nicholson]
- Use only major version number for libz.so on IRIX and OSF1 [Reinholdtsen]
- Use fdopen() (not _fdopen()) for Interix in zutil.h [Bäck]
- Add some FAQ entries about the contrib directory
- Update the MVS question in the FAQ
- Avoid extraneous reads after EOF in gzio.c [Brown]
- Correct spelling of "successfully" in gzio.c [Randers-Pehrson]
- Add comments to zlib.h about gzerror() usage [Brown]
- Set extra flags in gzip header in gzopen() like deflate() does
- Make configure options more compatible with double-dash conventions
  [Weigelt]
- Clean up compilation under Solaris SunStudio cc [Rowe, Reinholdtsen]
- Fix uninstall target in Makefile.in [Truta]
- Add pkgconfig support [Weigelt]
- Use $(DESTDIR) macro in Makefile.in [Reinholdtsen, Weigelt]
- Replace set_data_type() with a more accurate detect_data_type() in
  trees.c, according to the txtvsbin.txt document [Truta]
- Swap the order of #include <stdio.h> and #include "zlib.h" in
  gzio.c, example.c and minigzip.c [Truta]
- Shut up annoying VS2005 warnings about standard C deprecation [Rowe,
  Truta] (where?)
- Fix target "clean" from win32/Makefile.bor [Truta]
- Create .pdb and .manifest files in win32/makefile.msc [Ziegler, Rowe]
- Update zlib www home address in win32/DLL_FAQ.txt [Truta]
- Update contrib/masmx86/inffas32.asm for VS2005 [Vollant, Van Wassenhove]
- Enable browse info in the "Debug" and "ASM Debug" configurations in
  the Visual C++ 6 project, and set (non-ASM) "Debug" as default [Truta]
- Add pkgconfig support [Weigelt]
- Add ZLIB_VER_MAJOR, ZLIB_VER_MINOR and ZLIB_VER_REVISION in zlib.h,
  for use in win32/zlib1.rc [Polushin, Rowe, Truta]
- Add a document that explains the new text detection scheme to
  doc/txtvsbin.txt [Truta]
- Add rfc1950.txt, rfc1951.txt and rfc1952.txt to doc/ [Truta]
- Move algorithm.txt into doc/ [Truta]
- Synchronize FAQ with website
- Fix compressBound(), was low for some pathological cases [Fearnley]
- Take into account wrapper variations in deflateBound()
- Set examples/zpipe.c input and output to binary mode for Windows
- Update examples/zlib_how.html with new zpipe.c (also web site)
- Fix some warnings in examples/gzlog.c and examples/zran.c (it seems
  that gcc became pickier in 4.0)
- Add zlib.map for Linux: "All symbols from zlib-1.1.4 remain
  un-versioned, the patch adds versioning only for symbols introduced in
  zlib-1.2.0 or later.  It also declares as local those symbols which are
  not designed to be exported." [Levin]
- Update Z_PREFIX list in zconf.in.h, add --zprefix option to configure
- Do not initialize global static by default in trees.c, add a response
  NO_INIT_GLOBAL_POINTERS to initialize them if needed [Marquess]
- Don't use strerror() in gzio.c under WinCE [Yakimov]
- Don't use errno.h in zutil.h under WinCE [Yakimov]
- Move arguments for AR to its usage to allow replacing ar [Marot]
- Add HAVE_VISIBILITY_PRAGMA in zconf.in.h for Mozilla [Randers-Pehrson]
- Improve inflateInit() and inflateInit2() documentation
- Fix structure size comment in inflate.h
- Change configure help option from --h* to --help [Santos]
2017-01-10 00:25:29 +00:00

956 lines
36 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Network Working Group P. Deutsch
Request for Comments: 1951 Aladdin Enterprises
Category: Informational May 1996
DEFLATE Compressed Data Format Specification version 1.3
Status of This Memo
This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution of
this memo is unlimited.
IESG Note:
The IESG takes no position on the validity of any Intellectual
Property Rights statements contained in this document.
Notices
Copyright (c) 1996 L. Peter Deutsch
Permission is granted to copy and distribute this document for any
purpose and without charge, including translations into other
languages and incorporation into compilations, provided that the
copyright notice and this notice are preserved, and that any
substantive changes or deletions from the original are clearly
marked.
A pointer to the latest version of this and related documentation in
HTML format can be found at the URL
<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
Abstract
This specification defines a lossless compressed data format that
compresses data using a combination of the LZ77 algorithm and Huffman
coding, with efficiency comparable to the best currently available
general-purpose compression methods. The data can be produced or
consumed, even for an arbitrarily long sequentially presented input
data stream, using only an a priori bounded amount of intermediate
storage. The format can be implemented readily in a manner not
covered by patents.
Deutsch Informational [Page 1]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
Table of Contents
1. Introduction ................................................... 2
1.1. Purpose ................................................... 2
1.2. Intended audience ......................................... 3
1.3. Scope ..................................................... 3
1.4. Compliance ................................................ 3
1.5. Definitions of terms and conventions used ................ 3
1.6. Changes from previous versions ............................ 4
2. Compressed representation overview ............................. 4
3. Detailed specification ......................................... 5
3.1. Overall conventions ....................................... 5
3.1.1. Packing into bytes .................................. 5
3.2. Compressed block format ................................... 6
3.2.1. Synopsis of prefix and Huffman coding ............... 6
3.2.2. Use of Huffman coding in the "deflate" format ....... 7
3.2.3. Details of block format ............................. 9
3.2.4. Non-compressed blocks (BTYPE=00) ................... 11
3.2.5. Compressed blocks (length and distance codes) ...... 11
3.2.6. Compression with fixed Huffman codes (BTYPE=01) .... 12
3.2.7. Compression with dynamic Huffman codes (BTYPE=10) .. 13
3.3. Compliance ............................................... 14
4. Compression algorithm details ................................. 14
5. References .................................................... 16
6. Security Considerations ....................................... 16
7. Source code ................................................... 16
8. Acknowledgements .............................................. 16
9. Author's Address .............................................. 17
1. Introduction
1.1. Purpose
The purpose of this specification is to define a lossless
compressed data format that:
* Is independent of CPU type, operating system, file system,
and character set, and hence can be used for interchange;
* Can be produced or consumed, even for an arbitrarily long
sequentially presented input data stream, using only an a
priori bounded amount of intermediate storage, and hence
can be used in data communications or similar structures
such as Unix filters;
* Compresses data with efficiency comparable to the best
currently available general-purpose compression methods,
and in particular considerably better than the "compress"
program;
* Can be implemented readily in a manner not covered by
patents, and hence can be practiced freely;
Deutsch Informational [Page 2]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
* Is compatible with the file format produced by the current
widely used gzip utility, in that conforming decompressors
will be able to read data produced by the existing gzip
compressor.
The data format defined by this specification does not attempt to:
* Allow random access to compressed data;
* Compress specialized data (e.g., raster graphics) as well
as the best currently available specialized algorithms.
A simple counting argument shows that no lossless compression
algorithm can compress every possible input data set. For the
format defined here, the worst case expansion is 5 bytes per 32K-
byte block, i.e., a size increase of 0.015% for large data sets.
English text usually compresses by a factor of 2.5 to 3;
executable files usually compress somewhat less; graphical data
such as raster images may compress much more.
1.2. Intended audience
This specification is intended for use by implementors of software
to compress data into "deflate" format and/or decompress data from
"deflate" format.
The text of the specification assumes a basic background in
programming at the level of bits and other primitive data
representations. Familiarity with the technique of Huffman coding
is helpful but not required.
1.3. Scope
The specification specifies a method for representing a sequence
of bytes as a (usually shorter) sequence of bits, and a method for
packing the latter bit sequence into bytes.
1.4. Compliance
Unless otherwise indicated below, a compliant decompressor must be
able to accept and decompress any data set that conforms to all
the specifications presented here; a compliant compressor must
produce data sets that conform to all the specifications presented
here.
1.5. Definitions of terms and conventions used
Byte: 8 bits stored or transmitted as a unit (same as an octet).
For this specification, a byte is exactly 8 bits, even on machines
Deutsch Informational [Page 3]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
which store a character on a number of bits different from eight.
See below, for the numbering of bits within a byte.
String: a sequence of arbitrary bytes.
1.6. Changes from previous versions
There have been no technical changes to the deflate format since
version 1.1 of this specification. In version 1.2, some
terminology was changed. Version 1.3 is a conversion of the
specification to RFC style.
2. Compressed representation overview
A compressed data set consists of a series of blocks, corresponding
to successive blocks of input data. The block sizes are arbitrary,
except that non-compressible blocks are limited to 65,535 bytes.
Each block is compressed using a combination of the LZ77 algorithm
and Huffman coding. The Huffman trees for each block are independent
of those for previous or subsequent blocks; the LZ77 algorithm may
use a reference to a duplicated string occurring in a previous block,
up to 32K input bytes before.
Each block consists of two parts: a pair of Huffman code trees that
describe the representation of the compressed data part, and a
compressed data part. (The Huffman trees themselves are compressed
using Huffman encoding.) The compressed data consists of a series of
elements of two types: literal bytes (of strings that have not been
detected as duplicated within the previous 32K input bytes), and
pointers to duplicated strings, where a pointer is represented as a
pair <length, backward distance>. The representation used in the
"deflate" format limits distances to 32K bytes and lengths to 258
bytes, but does not limit the size of a block, except for
uncompressible blocks, which are limited as noted above.
Each type of value (literals, distances, and lengths) in the
compressed data is represented using a Huffman code, using one code
tree for literals and lengths and a separate code tree for distances.
The code trees for each block appear in a compact form just before
the compressed data for that block.
Deutsch Informational [Page 4]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
3. Detailed specification
3.1. Overall conventions In the diagrams below, a box like this:
+---+
| | <-- the vertical bars might be missing
+---+
represents one byte; a box like this:
+==============+
| |
+==============+
represents a variable number of bytes.
Bytes stored within a computer do not have a "bit order", since
they are always treated as a unit. However, a byte considered as
an integer between 0 and 255 does have a most- and least-
significant bit, and since we write numbers with the most-
significant digit on the left, we also write bytes with the most-
significant bit on the left. In the diagrams below, we number the
bits of a byte so that bit 0 is the least-significant bit, i.e.,
the bits are numbered:
+--------+
|76543210|
+--------+
Within a computer, a number may occupy multiple bytes. All
multi-byte numbers in the format described here are stored with
the least-significant byte first (at the lower memory address).
For example, the decimal number 520 is stored as:
0 1
+--------+--------+
|00001000|00000010|
+--------+--------+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8
3.1.1. Packing into bytes
This document does not address the issue of the order in which
bits of a byte are transmitted on a bit-sequential medium,
since the final data format described here is byte- rather than
Deutsch Informational [Page 5]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
bit-oriented. However, we describe the compressed block format
in below, as a sequence of data elements of various bit
lengths, not a sequence of bytes. We must therefore specify
how to pack these data elements into bytes to form the final
compressed byte sequence:
* Data elements are packed into bytes in order of
increasing bit number within the byte, i.e., starting
with the least-significant bit of the byte.
* Data elements other than Huffman codes are packed
starting with the least-significant bit of the data
element.
* Huffman codes are packed starting with the most-
significant bit of the code.
In other words, if one were to print out the compressed data as
a sequence of bytes, starting with the first byte at the
*right* margin and proceeding to the *left*, with the most-
significant bit of each byte on the left as usual, one would be
able to parse the result from right to left, with fixed-width
elements in the correct MSB-to-LSB order and Huffman codes in
bit-reversed order (i.e., with the first bit of the code in the
relative LSB position).
3.2. Compressed block format
3.2.1. Synopsis of prefix and Huffman coding
Prefix coding represents symbols from an a priori known
alphabet by bit sequences (codes), one code for each symbol, in
a manner such that different symbols may be represented by bit
sequences of different lengths, but a parser can always parse
an encoded string unambiguously symbol-by-symbol.
We define a prefix code in terms of a binary tree in which the
two edges descending from each non-leaf node are labeled 0 and
1 and in which the leaf nodes correspond one-for-one with (are
labeled with) the symbols of the alphabet; then the code for a
symbol is the sequence of 0's and 1's on the edges leading from
the root to the leaf labeled with that symbol. For example:
Deutsch Informational [Page 6]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
/\ Symbol Code
0 1 ------ ----
/ \ A 00
/\ B B 1
0 1 C 011
/ \ D 010
A /\
0 1
/ \
D C
A parser can decode the next symbol from an encoded input
stream by walking down the tree from the root, at each step
choosing the edge corresponding to the next input bit.
Given an alphabet with known symbol frequencies, the Huffman
algorithm allows the construction of an optimal prefix code
(one which represents strings with those symbol frequencies
using the fewest bits of any possible prefix codes for that
alphabet). Such a code is called a Huffman code. (See
reference [1] in Chapter 5, references for additional
information on Huffman codes.)
Note that in the "deflate" format, the Huffman codes for the
various alphabets must not exceed certain maximum code lengths.
This constraint complicates the algorithm for computing code
lengths from symbol frequencies. Again, see Chapter 5,
references for details.
3.2.2. Use of Huffman coding in the "deflate" format
The Huffman codes used for each alphabet in the "deflate"
format have two additional rules:
* All codes of a given bit length have lexicographically
consecutive values, in the same order as the symbols
they represent;
* Shorter codes lexicographically precede longer codes.
Deutsch Informational [Page 7]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
We could recode the example above to follow this rule as
follows, assuming that the order of the alphabet is ABCD:
Symbol Code
------ ----
A 10
B 0
C 110
D 111
I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
lexicographically consecutive.
Given this rule, we can define the Huffman code for an alphabet
just by giving the bit lengths of the codes for each symbol of
the alphabet in order; this is sufficient to determine the
actual codes. In our example, the code is completely defined
by the sequence of bit lengths (2, 1, 3, 3). The following
algorithm generates the codes as integers, intended to be read
from most- to least-significant bit. The code lengths are
initially in tree[I].Len; the codes are produced in
tree[I].Code.
1) Count the number of codes for each code length. Let
bl_count[N] be the number of codes of length N, N >= 1.
2) Find the numerical value of the smallest code for each
code length:
code = 0;
bl_count[0] = 0;
for (bits = 1; bits <= MAX_BITS; bits++) {
code = (code + bl_count[bits-1]) << 1;
next_code[bits] = code;
}
3) Assign numerical values to all codes, using consecutive
values for all codes of the same length with the base
values determined at step 2. Codes that are never used
(which have a bit length of zero) must not be assigned a
value.
for (n = 0; n <= max_code; n++) {
len = tree[n].Len;
if (len != 0) {
tree[n].Code = next_code[len];
next_code[len]++;
}
Deutsch Informational [Page 8]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
}
Example:
Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,
3, 2, 4, 4). After step 1, we have:
N bl_count[N]
- -----------
2 1
3 5
4 2
Step 2 computes the following next_code values:
N next_code[N]
- ------------
1 0
2 0
3 2
4 14
Step 3 produces the following code values:
Symbol Length Code
------ ------ ----
A 3 010
B 3 011
C 3 100
D 3 101
E 3 110
F 2 00
G 4 1110
H 4 1111
3.2.3. Details of block format
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
Note that the header bits do not necessarily begin on a byte
boundary, since a block does not necessarily occupy an integral
number of bytes.
Deutsch Informational [Page 9]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
The only difference between the two compressed cases is how the
Huffman codes for the literal/length and distance alphabets are
defined.
In all cases, the decoding algorithm for the actual data is as
follows:
do
read block header from input stream.
if stored with no compression
skip any remaining bits in current partially
processed byte
read LEN and NLEN (see next section)
copy LEN bytes of data to output
otherwise
if compressed with dynamic Huffman codes
read representation of code trees (see
subsection below)
loop (until end of block code recognized)
decode literal/length value from input stream
if value < 256
copy value (literal byte) to output stream
otherwise
if value = end of block (256)
break from loop
otherwise (value = 257..285)
decode distance from input stream
move backwards distance bytes in the output
stream, and copy length bytes from this
position to the output stream.
end loop
while not last block
Note that a duplicated string reference may refer to a string
in a previous block; i.e., the backward distance may cross one
or more block boundaries. However a distance cannot refer past
the beginning of the output stream. (An application using a
Deutsch Informational [Page 10]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
preset dictionary might discard part of the output stream; a
distance can refer to that part of the output stream anyway)
Note also that the referenced string may overlap the current
position; for example, if the last 2 bytes decoded have values
X and Y, a string reference with <length = 5, distance = 2>
adds X,Y,X,Y,X to the output stream.
We now specify each compression method in turn.
3.2.4. Non-compressed blocks (BTYPE=00)
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
3.2.5. Compressed blocks (length and distance codes)
As noted above, encoded data blocks in the "deflate" format
consist of sequences of symbols drawn from three conceptually
distinct alphabets: either literal bytes, from the alphabet of
byte values (0..255), or <length, backward distance> pairs,
where the length is drawn from (3..258) and the distance is
drawn from (1..32,768). In fact, the literal and length
alphabets are merged into a single alphabet (0..285), where
values 0..255 represent literal bytes, the value 256 indicates
end-of-block, and values 257..285 represent length codes
(possibly in conjunction with extra bits following the symbol
code) as follows:
Deutsch Informational [Page 11]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
Extra Extra Extra
Code Bits Length(s) Code Bits Lengths Code Bits Length(s)
---- ---- ------ ---- ---- ------- ---- ---- -------
257 0 3 267 1 15,16 277 4 67-82
258 0 4 268 1 17,18 278 4 83-98
259 0 5 269 2 19-22 279 4 99-114
260 0 6 270 2 23-26 280 4 115-130
261 0 7 271 2 27-30 281 5 131-162
262 0 8 272 2 31-34 282 5 163-194
263 0 9 273 3 35-42 283 5 195-226
264 0 10 274 3 43-50 284 5 227-257
265 1 11,12 275 3 51-58 285 0 258
266 1 13,14 276 3 59-66
The extra bits should be interpreted as a machine integer
stored with the most-significant bit first, e.g., bits 1110
represent the value 14.
Extra Extra Extra
Code Bits Dist Code Bits Dist Code Bits Distance
---- ---- ---- ---- ---- ------ ---- ---- --------
0 0 1 10 4 33-48 20 9 1025-1536
1 0 2 11 4 49-64 21 9 1537-2048
2 0 3 12 5 65-96 22 10 2049-3072
3 0 4 13 5 97-128 23 10 3073-4096
4 1 5,6 14 6 129-192 24 11 4097-6144
5 1 7,8 15 6 193-256 25 11 6145-8192
6 2 9-12 16 7 257-384 26 12 8193-12288
7 2 13-16 17 7 385-512 27 12 12289-16384
8 3 17-24 18 8 513-768 28 13 16385-24576
9 3 25-32 19 8 769-1024 29 13 24577-32768
3.2.6. Compression with fixed Huffman codes (BTYPE=01)
The Huffman codes for the two alphabets are fixed, and are not
represented explicitly in the data. The Huffman code lengths
for the literal/length alphabet are:
Lit Value Bits Codes
--------- ---- -----
0 - 143 8 00110000 through
10111111
144 - 255 9 110010000 through
111111111
256 - 279 7 0000000 through
0010111
280 - 287 8 11000000 through
11000111
Deutsch Informational [Page 12]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
The code lengths are sufficient to generate the actual codes,
as described above; we show the codes in the table for added
clarity. Literal/length values 286-287 will never actually
occur in the compressed data, but participate in the code
construction.
Distance codes 0-31 are represented by (fixed-length) 5-bit
codes, with possible additional bits as shown in the table
shown in Paragraph 3.2.5, above. Note that distance codes 30-
31 will never actually occur in the compressed data.
3.2.7. Compression with dynamic Huffman codes (BTYPE=10)
The Huffman codes for the two alphabets appear in the block
immediately after the header bits and before the actual
compressed data, first the literal/length code and then the
distance code. Each code is defined by a sequence of code
lengths, as discussed in Paragraph 3.2.2, above. For even
greater compactness, the code length sequences themselves are
compressed using a Huffman code. The alphabet for code lengths
is as follows:
0 - 15: Represent code lengths of 0 - 15
16: Copy the previous code length 3 - 6 times.
The next 2 bits indicate repeat length
(0 = 3, ... , 3 = 6)
Example: Codes 8, 16 (+2 bits 11),
16 (+2 bits 10) will expand to
12 code lengths of 8 (1 + 6 + 5)
17: Repeat a code length of 0 for 3 - 10 times.
(3 bits of length)
18: Repeat a code length of 0 for 11 - 138 times
(7 bits of length)
A code length of 0 indicates that the corresponding symbol in
the literal/length or distance alphabet will not occur in the
block, and should not participate in the Huffman code
construction algorithm given earlier. If only one distance
code is used, it is encoded using one bit, not zero bits; in
this case there is a single code length of one, with one unused
code. One distance code of zero bits means that there are no
distance codes used at all (the data is all literals).
We can now define the format of the block:
5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
5 Bits: HDIST, # of Distance codes - 1 (1 - 32)
4 Bits: HCLEN, # of Code Length codes - 4 (4 - 19)
Deutsch Informational [Page 13]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
(HCLEN + 4) x 3 bits: code lengths for the code length
alphabet given just above, in the order: 16, 17, 18,
0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
These code lengths are interpreted as 3-bit integers
(0-7); as above, a code length of 0 means the
corresponding symbol (literal/length or distance code
length) is not used.
HLIT + 257 code lengths for the literal/length alphabet,
encoded using the code length Huffman code
HDIST + 1 code lengths for the distance alphabet,
encoded using the code length Huffman code
The actual compressed data of the block,
encoded using the literal/length and distance Huffman
codes
The literal/length symbol 256 (end of data),
encoded using the literal/length Huffman code
The code length repeat codes can cross from HLIT + 257 to the
HDIST + 1 code lengths. In other words, all code lengths form
a single sequence of HLIT + HDIST + 258 values.
3.3. Compliance
A compressor may limit further the ranges of values specified in
the previous section and still be compliant; for example, it may
limit the range of backward pointers to some value smaller than
32K. Similarly, a compressor may limit the size of blocks so that
a compressible block fits in memory.
A compliant decompressor must accept the full range of possible
values defined in the previous section, and must accept blocks of
arbitrary size.
4. Compression algorithm details
While it is the intent of this document to define the "deflate"
compressed data format without reference to any particular
compression algorithm, the format is related to the compressed
formats produced by LZ77 (Lempel-Ziv 1977, see reference [2] below);
since many variations of LZ77 are patented, it is strongly
recommended that the implementor of a compressor follow the general
algorithm presented here, which is known not to be patented per se.
The material in this section is not part of the definition of the
Deutsch Informational [Page 14]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
specification per se, and a compressor need not follow it in order to
be compliant.
The compressor terminates a block when it determines that starting a
new block with fresh trees would be useful, or when the block size
fills up the compressor's block buffer.
The compressor uses a chained hash table to find duplicated strings,
using a hash function that operates on 3-byte sequences. At any
given point during compression, let XYZ be the next 3 input bytes to
be examined (not necessarily all different, of course). First, the
compressor examines the hash chain for XYZ. If the chain is empty,
the compressor simply writes out X as a literal byte and advances one
byte in the input. If the hash chain is not empty, indicating that
the sequence XYZ (or, if we are unlucky, some other 3 bytes with the
same hash function value) has occurred recently, the compressor
compares all strings on the XYZ hash chain with the actual input data
sequence starting at the current point, and selects the longest
match.
The compressor searches the hash chains starting with the most recent
strings, to favor small distances and thus take advantage of the
Huffman encoding. The hash chains are singly linked. There are no
deletions from the hash chains; the algorithm simply discards matches
that are too old. To avoid a worst-case situation, very long hash
chains are arbitrarily truncated at a certain length, determined by a
run-time parameter.
To improve overall compression, the compressor optionally defers the
selection of matches ("lazy matching"): after a match of length N has
been found, the compressor searches for a longer match starting at
the next input byte. If it finds a longer match, it truncates the
previous match to a length of one (thus producing a single literal
byte) and then emits the longer match. Otherwise, it emits the
original match, and, as described above, advances N bytes before
continuing.
Run-time parameters also control this "lazy match" procedure. If
compression ratio is most important, the compressor attempts a
complete second search regardless of the length of the first match.
In the normal case, if the current match is "long enough", the
compressor reduces the search for a longer match, thus speeding up
the process. If speed is most important, the compressor inserts new
strings in the hash table only when no match was found, or when the
match is not "too long". This degrades the compression ratio but
saves time since there are both fewer insertions and fewer searches.
Deutsch Informational [Page 15]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
5. References
[1] Huffman, D. A., "A Method for the Construction of Minimum
Redundancy Codes", Proceedings of the Institute of Radio
Engineers, September 1952, Volume 40, Number 9, pp. 1098-1101.
[2] Ziv J., Lempel A., "A Universal Algorithm for Sequential Data
Compression", IEEE Transactions on Information Theory, Vol. 23,
No. 3, pp. 337-343.
[3] Gailly, J.-L., and Adler, M., ZLIB documentation and sources,
available in ftp://ftp.uu.net/pub/archiving/zip/doc/
[4] Gailly, J.-L., and Adler, M., GZIP documentation and sources,
available as gzip-*.tar in ftp://prep.ai.mit.edu/pub/gnu/
[5] Schwartz, E. S., and Kallick, B. "Generating a canonical prefix
encoding." Comm. ACM, 7,3 (Mar. 1964), pp. 166-169.
[6] Hirschberg and Lelewer, "Efficient decoding of prefix codes,"
Comm. ACM, 33,4, April 1990, pp. 449-459.
6. Security Considerations
Any data compression method involves the reduction of redundancy in
the data. Consequently, any corruption of the data is likely to have
severe effects and be difficult to correct. Uncompressed text, on
the other hand, will probably still be readable despite the presence
of some corrupted bytes.
It is recommended that systems using this data format provide some
means of validating the integrity of the compressed data. See
reference [3], for example.
7. Source code
Source code for a C language implementation of a "deflate" compliant
compressor and decompressor is available within the zlib package at
ftp://ftp.uu.net/pub/archiving/zip/zlib/.
8. Acknowledgements
Trademarks cited in this document are the property of their
respective owners.
Phil Katz designed the deflate format. Jean-Loup Gailly and Mark
Adler wrote the related software described in this specification.
Glenn Randers-Pehrson converted this document to RFC and HTML format.
Deutsch Informational [Page 16]
RFC 1951 DEFLATE Compressed Data Format Specification May 1996
9. Author's Address
L. Peter Deutsch
Aladdin Enterprises
203 Santa Margarita Ave.
Menlo Park, CA 94025
Phone: (415) 322-0103 (AM only)
FAX: (415) 322-1734
EMail: <ghost@aladdin.com>
Questions about the technical content of this specification can be
sent by email to:
Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
Mark Adler <madler@alumni.caltech.edu>
Editorial comments on this specification can be sent by email to:
L. Peter Deutsch <ghost@aladdin.com> and
Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
Deutsch Informational [Page 17]