Commit Graph

34 Commits

Author SHA1 Message Date
joerg
6818646ac8 Use __dead 2011-09-16 15:39:25 +00:00
christos
7e6e5c1f48 Add an 'l' style for sorting that sorts by the string length of the field. 2010-12-18 23:09:48 +00:00
dholland
8696c1b71e fixit() needs to know the getopt options list to do its thing correctly. 2010-06-05 17:44:51 +00:00
enami
47e571f2ea Don't touch past the end of allocated region. It results segmentation
violation.
2010-02-05 21:58:41 +00:00
dsl
eab2f96cf5 Fix borked fix for sort relying on realloc() changing the buffer end.
Sorts of more than 8MB data now probably work again.
2009-09-28 20:30:01 +00:00
dsl
6458ae9cdf Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
2009-09-26 21:16:55 +00:00
dsl
1310aa04b4 Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
2009-09-10 22:02:40 +00:00
dsl
2abdfb3907 Now we have our own radix_sort() change the interface so that we pass
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
2009-09-05 12:00:25 +00:00
dsl
4611f32c1c Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
2009-09-05 09:16:18 +00:00
dsl
7b4a02befd Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
  during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
  numeric values.  Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
  code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
  has to be done when writing the output file for small files.
  Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
2009-08-22 10:53:28 +00:00
dsl
bf80c84843 Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
2009-08-20 06:36:25 +00:00
dsl
f155f3b8b9 The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
2009-08-18 18:00:28 +00:00
dsl
9ab8b68075 Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data).  Delete TRECHEADER.
2009-08-16 19:53:43 +00:00
dsl
9987745061 Remove reference to db.h by using separate ptr+len fields for the only
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
2009-08-15 18:40:01 +00:00
lukem
64d3192b1d Fix WARNS=4 issues (-Wcast-qual -Wsign-compare) 2009-04-13 11:07:59 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
jdolecek
9f77432368 remove compile-time limit on number of -k options, allocate necessary
structures as-needed
2004-02-15 14:22:55 +00:00
jdolecek
f84513a754 add TNF copyright 2003-08-07 11:32:34 +00:00
agc
89aaa1bb64 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22365, verified by myself.
2003-08-07 11:13:06 +00:00
jdolecek
63ae9a5e5f make function merge() static in msort.c
cosmetic change to how local variable is incremented (moved to for(;;))
2002-12-25 21:19:15 +00:00
jdolecek
fed8f4c4a6 put contents of extern.h directly to sort.h, and g/c extern.h
de-__P()
2002-12-24 15:02:46 +00:00
jdolecek
9208bb6e3a add extern definition for ncols and clist[] to sort.h, eliminate extra
definitions in init.c and field.c
g/c MAXMERGE
2002-12-24 13:20:25 +00:00
jdolecek
7f547730fd cosmetic changes - make keylist[] static and remove extern definition
in fsort.h, move macro SALIGN() from sort.h to fsort.c
2001-02-19 19:31:29 +00:00
jdolecek
75067b134f adjust intendation 2001-01-19 10:14:31 +00:00
shin
1d9514fbe4 - fix alignment problem. 2001-01-16 12:06:19 +00:00
jdolecek
c0f11cbc8f alltable[], itable[], dtable[] were moved to init.c, g/c from sort.[ch]
put extern declaration for gweights[] to sort.h
2001-01-12 19:41:13 +00:00
jdolecek
1c216f18ea general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
  and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
2001-01-11 14:05:24 +00:00
jdolecek
af3472c08c constify a bit, small cleanups 2001-01-08 18:35:49 +00:00
jdolecek
e4de90b20d by default, use stable sort
add -S flag to switch to non-stable sort; for GNU sort compatibility,
provide -s flag too
2001-01-08 18:00:31 +00:00
jdolecek
ab259a291a enlarge line buffer as necessary, so that it's possible
to process lines longer than 65522 characters
constify, rename MAXLLEN to DEFLLEN
2000-10-16 21:53:19 +00:00
simonb
f6518b2053 Include <string.h> to get prototype for memcpy(). Fixed compile problems
on alpha (and other LP64 archs?).

XXX: Can't gcc be fixed so that it doesn't auto-prototype mem*()??
2000-10-07 22:15:29 +00:00
bjh21
e5218d1719 Two classes of changes from the initial OpenBSD commit of this sort(1):
FILE * variables are called "fp" rather than "fd".
Better (safer) temporary-file handling.
2000-10-07 20:37:06 +00:00
bjh21
6029888a3a Hit sort(1) with a hammer till it compiles.
Also add RCSIDs.
2000-10-07 18:37:09 +00:00
bjh21
1d5d9b5b60 4.4BSD-Lite2 contrib/sort 2000-10-07 16:39:34 +00:00