Commit Graph

208 Commits

Author SHA1 Message Date
simonb 117e5c1c54 Correct a comment - 8 * 1 million is 8 million, not 10 million (!). 2023-10-25 05:51:11 +00:00
mrg 81a719df6e avoid various use-after-free issues.
create a ptrdiff_t offset between the start of an allocation region and
some interesting pointer, so it can be adjusted with this offset after
realloc() returns.  for pdisk(), realloc() is a locally inlind malloc()
and free() pair.

for mail(1), this required a little bit more effort as the old pointer
was passed into another file for fix-ups there, and that code needed to
be adjusted for offset vs old pointer usage.

found by GCC 12.
2023-08-10 20:36:28 +00:00
rin 03675fcefd Revert CC_WNO_USE_AFTER_FREE from Makefile's (thanks uwe@) 2023-08-03 14:56:36 +00:00
rin 91f8ac6d30 Sprinkle CC_WNO_USE_AFTER_FREE for GCC 12
All of them are blamed for idiom equivalent to:
	newbuf = realloc(buf, size);
	p = newbuf + (p - buf);
2023-08-03 13:36:08 +00:00
lukem c4b7a9e794 bsd.own.mk: rename GCC_NO_* to CC_WNO_*
Rename compiler-warning-disable variables from
	GCC_NO_warning
to
	CC_WNO_warning
where warning is the full warning name as used by the compiler.

GCC_NO_IMPLICIT_FALLTHRU is CC_WNO_IMPLICIT_FALLTHROUGH

Using the convention CC_compilerflag, where compilerflag
is based on the full compiler flag name.
2023-06-03 09:09:01 +00:00
andvar c1d86c1466 fix few more typos in comments, messages and documentation. 2021-09-19 11:37:00 +00:00
mrg de11d87641 introduce some common variables for use in GCC warning disables:
GCC_NO_FORMAT_TRUNCATION    -Wno-format-truncation (GCC 7/8)
GCC_NO_STRINGOP_TRUNCATION  -Wno-stringop-truncation (GCC 8)
GCC_NO_STRINGOP_OVERFLOW    -Wno-stringop-overflow (GCC 8)
GCC_NO_CAST_FUNCTION_TYPE   -Wno-cast-function-type (GCC 8)

use these to turn off warnings for most GCC-8 complaints.  many
of these are false positives, most of the real bugs are already
commited, or are yet to come.


we plan to introduce versions of (some?) of these that use the
"-Wno-error=" form, which still displays the warnings but does
not make it an error, and all of the above will be re-considered
as either being "fix me" (warning still displayed) or "warning
is wrong."
2019-10-13 07:28:04 +00:00
sevan b80ec07d43 sort was there since v1
https://www.bell-labs.com/usr/dmr/www/man61.pdf
2019-09-01 18:04:54 +00:00
msaitoh 03e9d50adf Fix typo (s/supress/suppress/). 2019-07-11 03:49:51 +00:00
wiz 01869ca4d2 Remove workaround for ancient HTML generation code. 2017-07-03 21:28:48 +00:00
christos 5326356364 refactor includes, add <sys/stat.h> 2017-01-10 21:13:45 +00:00
abhinav fa0751d0dd Add missing full stop. 2016-12-21 09:06:24 +00:00
wiz 738f858418 Sort options and their descriptions. Sync usage more with man page.
Bump date in man page for new option -C.
2016-06-01 08:24:03 +00:00
kre a868903568 Add the posix -C option (-c but quieter). Fix -R to work properly when
setting \n as the record delimited using a numeric value rather than literal
\n - and to not incorrectly turn \n into a field separator if -R is used to
make some other char the record separator (\n becomes a field separator in
that case as long as the field separator remains "white space" but should not
be in any other case - unless set explicitly of course.)

Plus more cosmetic changes - the man page and usage are updated to make it
more clear that the 2 (or 1) params to -k are not fields (field1 and field2)
but specifiers of the beginning and end of one key field.   There was an
unused 'x' option in the GETOPTS string.  The usage message is reformatted
to display properly on both 80 col and > 80 col displays (on < 80 it will
still probably look pretty ugly ... perhaps not quite so bad though), and
is also updated to show the different usage for the -c case (and -C) from the
others (only 1 file permitted) - the man page synopsis has a similar update.

Using more than one of -c -C or -m generates a usage message rather than
just ignoring the -m as it did before (there was no -C before of course).

Aside from the bug fix to the interaction between -R and -t, there are no
changes that affect the way anything is sorted (or read, or written).

Discussed on tech-userlevel earlier this week.
2016-06-01 02:37:55 +00:00
mrg 366296f5a8 add a description about what was being attempted to failed writes messages. 2015-08-05 07:10:03 +00:00
christos 6e28978d84 fix unused variable warnings 2013-10-18 20:47:06 +00:00
wiz 2767d6c8a8 - Remove redundant argument to non-first `.Nm' macro;
- reference `-u' at `-c', to make more clear that the former can
  be used with the latter;
- bump date.

From Bug Hunting.

While here, use Aq.
2013-05-29 15:00:35 +00:00
apb dd481ceb43 As from today, numeric fields may begin with an optional
plus or minus sign, not only an optional minus sign.
2013-01-20 21:02:11 +00:00
apb 85744c86ad When parsing numbers, allow a leading '+'. 2013-01-20 10:12:58 +00:00
joerg 6818646ac8 Use __dead 2011-09-16 15:39:25 +00:00
wiz 33d2a9cdc6 Sort sections. 2010-12-18 23:36:23 +00:00
christos 7e6e5c1f48 Add an 'l' style for sorting that sorts by the string length of the field. 2010-12-18 23:09:48 +00:00
wiz 90abead58e Fix typo in comment. 2010-06-06 00:00:33 +00:00
dholland fcf4d3f750 Rework previous change to fixit() to not trip on option arguments. (Noticed
by wiz.) Clarify the loop logic involved.
2010-06-05 17:46:08 +00:00
dholland 8696c1b71e fixit() needs to know the getopt options list to do its thing correctly. 2010-06-05 17:44:51 +00:00
dholland b6360c7f71 Don't recognize "+3" after -- or after the first non-option argument.
This prevents converting "+3" into "-k4.1" in places where getopt
won't recognize it, which in turn prevents silly error messages and
lossage trying to sort files whose names begin with +. PR 43358.
2010-05-27 05:52:29 +00:00
jruoho 3ae25c77b6 RETURN VALUES -> EXIT STATUS. 2010-05-14 16:58:32 +00:00
enami 47e571f2ea Don't touch past the end of allocated region. It results segmentation
violation.
2010-02-05 21:58:41 +00:00
joerg 2b8a053617 Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
2009-11-06 18:34:22 +00:00
dsl 43682b02ee If anyone is stupid enough to feed records longer than 8MB into sort, don't
sit in an infinite loop, instead eat memory until we have read 8 records.
2009-10-09 20:32:57 +00:00
dsl 41b3ada21c When we need to merge more than 16 files, do them in a hierarchy.
Reduces the amount of data written to temporary files.
The 3-level stack has to do a simple reduce after 4352 input files, for
a normal file sort this is 35GB of data or about 500 million records.
This needs about 50 open fd's - which should be ok.
Clearly the merge sort could process more input files in one go - speeding
up the sort, but at some point the number of input files would exceed
whatever limit was applied.
2009-10-09 20:29:43 +00:00
dsl 768e6fa973 Don't give merge an empty file when we detect EOF with nothing in our
buffer.
2009-10-09 20:23:19 +00:00
dsl 8b6ec7b129 long align records written to temporary files. 2009-10-07 21:03:29 +00:00
dsl 5aa782f502 When encoding numbers, we can use all 8 bits for exponent values. 2009-10-07 21:02:57 +00:00
dsl eab2f96cf5 Fix borked fix for sort relying on realloc() changing the buffer end.
Sorts of more than 8MB data now probably work again.
2009-09-28 20:30:01 +00:00
dsl 6458ae9cdf Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
2009-09-26 21:16:55 +00:00
dsl 800732bfdc Fix sort -u, PR/42094 2009-09-19 16:18:00 +00:00
dsl fe52672374 Minor tweaks to the key generation for numeric fields.
Use 1's compliment for -ve numbers to avoid confitionals.
2009-09-16 20:56:38 +00:00
dsl 1310aa04b4 Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
2009-09-10 22:02:40 +00:00
dsl 2abdfb3907 Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
2009-09-05 12:00:25 +00:00
dsl 4611f32c1c Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
2009-09-05 09:16:18 +00:00
wiz ea72fa6ee9 Fix pasto. 2009-08-23 15:45:08 +00:00
dsl 5166e91c70 Bring nearer to reality.
Note that -H is now ignored.
Move -S and -s (and -H) to the first list of options since they are
global ones, not ones that override the ordering rules.
2009-08-22 21:55:08 +00:00
dsl 5c6e557c4b <space> and <tab> at the start of key fields are supposed to be sorted
as if part of the data.
This is a bit fubar since we need a value than sorts before any byte value
as a key field separator - so need 257 byte values (since radixsort() doesn't
take a length for each record).
For now map '\t' to 0x01 and hope no one will notice!
2009-08-22 21:50:32 +00:00
dsl e0846c3698 Put radixsort() and sradixsort() the correct way around. 2009-08-22 21:43:53 +00:00
dsl f58fe5e68a Fix generation of unmasked alpha keys. 2009-08-22 21:28:55 +00:00
dsl b36440a064 Only process each number digit once. 2009-08-22 21:19:40 +00:00
dsl 609b8532b4 Add some comments and clarifications to this inpeneterable code.
When merging ensure we accurable sort records with identical keys by
file-number, otherwise a 'stable' sort won't be!
2009-08-22 15:16:50 +00:00
dsl 7b4a02befd Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
  during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
  numeric values.  Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
  code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
  has to be done when writing the output file for small files.
  Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
2009-08-22 10:53:28 +00:00
dsl bf80c84843 Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
2009-08-20 06:36:25 +00:00