Improve file header comments for astramer code.

Make it clear that "astreamer" stands for "archive streamer".
Generalize comments that still believe this code can only be used
by pg_basebackup. Add some comments explaining the asymmetry
between the gzip, lz4, and zstd astreamers, in the hopes of making
life easier for anyone who hacks on this code in the future.

Robert Haas, reviewed by Amul Sul.

Discussion: http://postgr.es/m/CAAJ_b97O2kkKVTWxt8MxDN1o-cDfbgokqtiN2yqFf48=gXpcxQ@mail.gmail.com
This commit is contained in:
Robert Haas 2024-08-07 08:49:41 -04:00
parent 2676040df0
commit 22b4a1b561
5 changed files with 42 additions and 6 deletions

View File

@ -2,6 +2,10 @@
*
* astreamer_file.c
*
* Archive streamers that write to files. astreamer_plain_writer writes
* the whole archive to a single file, and astreamer_extractor writes
* each archive member to a separate file in a given directory.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION

View File

@ -2,6 +2,21 @@
*
* astreamer_gzip.c
*
* Archive streamers that deal with data compressed using gzip.
* astreamer_gzip_writer applies gzip compression to the input data
* and writes the result to a file. astreamer_gzip_decompressor assumes
* that the input stream is compressed using gzip and decompresses it.
*
* Note that the code in this file is asymmetric with what we do for
* other compression types: for lz4 and zstd, there is a compressor and
* a decompressor, rather than a writer and a decompressor. The approach
* taken here is less flexible, because a writer can only write to a file,
* while a compressor can write to a subsequent astreamer which is free
* to do whatever it likes. The reason it's like this is because this
* code was adapated from old, less-modular pg_basebackup code that used
* the same APIs that astreamer_gzip_writer now uses, and it didn't seem
* necessary to change anything at the time.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION

View File

@ -2,6 +2,10 @@
*
* astreamer_lz4.c
*
* Archive streamers that deal with data compressed using lz4.
* astreamer_lz4_compressor applies lz4 compression to the input stream,
* and astreamer_lz4_decompressor does the reverse.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION

View File

@ -2,6 +2,10 @@
*
* astreamer_zstd.c
*
* Archive streamers that deal with data compressed using zstd.
* astreamer_zstd_compressor applies lz4 compression to the input stream,
* and astreamer_zstd_decompressor does the reverse.
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
*
* IDENTIFICATION

View File

@ -2,9 +2,18 @@
*
* astreamer.h
*
* Each tar archive returned by the server is passed to one or more
* astreamer objects for further processing. The astreamer may do
* something simple, like write the archive to a file, perhaps after
* The "archive streamer" interface is intended to allow frontend code
* to stream from possibly-compressed archive files from any source and
* perform arbitrary actions based on the contents of those archives.
* Archive streamers are intended to be composable, and most tasks will
* require two or more archive streamers to complete. For instance,
* if the input is an uncompressed tar stream, a tar parser astreamer
* could be used to interpret it, and then an extractor astreamer could
* be used to write each archive member out to a file.
*
* In general, each archive streamer is relatively free to take whatever
* action it desires in the stream of chunks provided by the caller. It
* may do something simple, like write the archive to a file, perhaps after
* compressing it, but it can also do more complicated things, like
* annotating the byte stream to indicate which parts of the data
* correspond to tar headers or trailing padding, vs. which parts are
@ -33,9 +42,9 @@ typedef struct astreamer_ops astreamer_ops;
/*
* Each chunk of archive data passed to a astreamer is classified into one
* of these categories. When data is first received from the remote server,
* each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks will
* be of whatever size the remote server chose to send.
* of these categories. When data is initially passed to an archive streamer,
* each chunk will be categorized as ASTREAMER_UNKNOWN, and the chunks can
* be of whatever size the caller finds convenient.
*
* If the archive is parsed (e.g. see astreamer_tar_parser_new()), then all
* chunks should be labelled as one of the other types listed here. In