Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
part of PR bin/26860 by Michael van Elst
while here, put output file fopen() inside the code block of the
only code path where it's actually needed, to make the logic more obvious;
and in the "stdout" case, initialize toutpath to empty string rather
then /dev/stdout, to make it clear /dev/stdout is not actually used
For some reason this program wants to open _hundreds_ of temporary files.
Make it setrlimit(RLIMIT_NOFILE, ...), so this rather dubious strategy at
least works well enough to ctag(1) our own kernel.
XXX
path buffer
- provide better error messages about why the temp file creation is failing
- explicitly compare syscall return to -1 instead of < 0 and fdopen return
to NULL instead of 0.
If no output file was specified sort fopened("/dev/stdout", "w").
This is *wrong* because "/dev/stdout" will truncate the output file,
thus undoing the append effect the shell had set up. The simple fix
here is to just arrange for outfp = stdout and don't play with /dev/stdout.
While I am here:
- KNF
- make pattern for mkstemp have 6 X's.
* get rid of union f_handle, replace by passing explicit int parameter
and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate