Commit Graph

60 Commits

Author SHA1 Message Date
christos
7e6e5c1f48 Add an 'l' style for sorting that sorts by the string length of the field. 2010-12-18 23:09:48 +00:00
dholland
8696c1b71e fixit() needs to know the getopt options list to do its thing correctly. 2010-06-05 17:44:51 +00:00
enami
47e571f2ea Don't touch past the end of allocated region. It results segmentation
violation.
2010-02-05 21:58:41 +00:00
joerg
2b8a053617 Retire __SCCSID. It has only archeological value now. Also retire lint
conditional around __RCSID, lint can handle that fine.
2009-11-06 18:34:22 +00:00
dsl
6458ae9cdf Move all the fopen() calls out of the record read routines into the callers.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
2009-09-26 21:16:55 +00:00
dsl
1310aa04b4 Save length of key instead of relying of the weight of the record sep.
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
2009-09-10 22:02:40 +00:00
dsl
4611f32c1c Include a local copy of the sradixsort() code from libc.
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
2009-09-05 09:16:18 +00:00
dsl
e0846c3698 Put radixsort() and sradixsort() the correct way around. 2009-08-22 21:43:53 +00:00
dsl
7b4a02befd Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
  during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
  numeric values.  Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
  code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
  has to be done when writing the output file for small files.
  Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
2009-08-22 10:53:28 +00:00
dsl
bf80c84843 Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
2009-08-20 06:36:25 +00:00
dsl
f155f3b8b9 The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
2009-08-18 18:00:28 +00:00
dsl
2a0ab276a2 Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
2009-08-15 09:48:46 +00:00
lukem
64d3192b1d Fix WARNS=4 issues (-Wcast-qual -Wsign-compare) 2009-04-13 11:07:59 +00:00
christos
079a9a0235 Make -R accept numeric arguments so one can say -R '\0' to be used in
pipelines like find . -print0 | sort -R '\0'. From Anon Ymous
2008-11-08 17:11:56 +00:00
lukem
98e5374ccb Remove the \n and tabs from the __COPYRIGHT() strings.
Tweak to use a consistent format.
2008-07-21 14:19:20 +00:00
martin
ce099b4099 Remove clause 3 and 4 from TNF licenses 2008-04-28 20:22:51 +00:00
jdolecek
a2e8970e19 when using -o into file which already exists, copy the permissions
of the original file to the new (sorted) file

adresses PR bin/26860 by Michael van Elst
2006-10-23 19:53:25 +00:00
jdolecek
bfa086e40a replace access(2) + /dev/ prefix check with lstat(2) and S_ISCHR()/S_ISBLK()
part of PR bin/26860 by Michael van Elst

while here, put output file fopen() inside the code block of the
only code path where it's actually needed, to make the logic more obvious;
and in the "stdout" case, initialize toutpath to empty string rather
then /dev/stdout, to make it clear /dev/stdout is not actually used
2006-10-23 19:39:54 +00:00
jdolecek
165c6691e1 use F_OK instead of 0 for second parameter of access(2)
part of PR bin/26860 by Michael van Elst
2006-10-23 19:11:46 +00:00
wiz
1919d4e949 Sync usage with man page. From Kouichirou Hiratsuka in PR 26278. 2004-07-23 13:26:11 +00:00
heas
c68b80b9a5 remove double initialisation of SINGL_FLD & SEP_FLAG 2004-03-14 21:09:30 +00:00
jdolecek
3b6344c769 ftpos pointer was not updated when fldtab was reallocated; drop completely
in favour of an index counter
fixes bin/24449 by Jun-ichiro itojun
2004-02-17 18:59:13 +00:00
jdolecek
d8c927fdbf fldtab[] needs to have one extra element - this marks end of array
adresses part of PR bin/24449 by Jun-ichiro itojun
2004-02-17 13:52:56 +00:00
itojun
aa7ee5b5c7 use safer realloc idiom
memset new region got by realloc
2004-02-17 02:38:47 +00:00
itojun
0795537158 initialize fldtab 2004-02-17 02:28:29 +00:00
jdolecek
9f77432368 remove compile-time limit on number of -k options, allocate necessary
structures as-needed
2004-02-15 14:22:55 +00:00
jdolecek
f84513a754 add TNF copyright 2003-08-07 11:32:34 +00:00
agc
89aaa1bb64 Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22365, verified by myself.
2003-08-07 11:13:06 +00:00
jdolecek
8852da41eb g/c many_files(), too 2002-12-24 14:58:57 +00:00
jdolecek
e296a59c79 bump 'soft' limit for number of files to hard limit on startup; we
want to be able to open as many temporary files as possible
2002-12-24 14:55:46 +00:00
jdolecek
77d4ae97df move fltab outside main and make it static, eliminate two memset()s
g/c superfluous extern definition for clist[] and ncols
make toutpath[] static
2002-12-24 13:09:38 +00:00
tron
21f56aa969 Remove the statically initialized "sigaction" structure completely because
such usage is broken. Problem pointed out by Klaus Klein on
"sources-changes@netbsd.org".
2002-11-27 16:47:13 +00:00
tron
f58cb59ba6 Add braces in a statically initialized "sigaction" structure to fix a
build problem after siginfo(2) has been added.
2002-11-27 14:44:46 +00:00
jdolecek
b4f19b2d56 disable the code which maxes nofiles limit, it should not be normally
needed now
2001-05-14 21:52:21 +00:00
ross
1959d24b79 XXX
For some reason this program wants to open _hundreds_ of temporary files.
Make it setrlimit(RLIMIT_NOFILE, ...), so this rather dubious strategy at
least works well enough to ctag(1) our own kernel.
XXX
2001-04-30 00:25:09 +00:00
christos
e56e039c8c - use MAXPATHLEN (1024) instead of _POSIX_PATH_MAX (255) for the temporary
path buffer
- provide better error messages about why the temp file creation is failing
- explicitly compare syscall return to -1 instead of < 0 and fdopen return
  to NULL instead of 0.
2001-02-22 22:45:49 +00:00
christos
faf9e3e459 Fix problem when using sort >> foo
If no output file was specified sort fopened("/dev/stdout", "w").
This is *wrong* because "/dev/stdout" will truncate the output file,
thus undoing the append effect the shell had set up. The simple fix
here is to just arrange for outfp = stdout and don't play with /dev/stdout.

While I am here:
	- KNF
	- make pattern for mkstemp have 6 X's.
2001-02-21 19:24:30 +00:00
jdolecek
aa9a452a75 full -T support 2001-02-19 15:53:07 +00:00
jdolecek
5347005ed0 resurrect old ftmp() - it supports alternative directory for temporary
file, which is needed for -T support
2001-02-19 15:45:45 +00:00
jdolecek
f6b0d130db use -R instead of -w, since that's what OpenBSD is using and there is no reason
to be different
2001-02-07 20:58:09 +00:00
jdolecek
14b38a0855 Since -T is used to select directory for temporary files in other sort
implementations, we should avoid using it for something else.
Use (new) flag -w for setting record delimiter, make -T noop.
2001-02-07 20:31:44 +00:00
jdolecek
44f2c62649 use errx(), not err() within section for '-t' flag 2001-02-07 19:47:44 +00:00
soren
ec09544572 And make usage() test for NULL explicitly.. 2001-01-13 20:21:56 +00:00
soren
7b5f324dcc usage() expects a NULL when there is no specific error message. 2001-01-13 20:20:47 +00:00
jdolecek
769f751499 save couple of cycles and bytes by static initialization of sigaction act
and sigtable[]
2001-01-13 11:19:41 +00:00
jdolecek
341955c93c alltable[], itable[], dtable[] were moved to init.c, g/c from sort.[ch]
put extern declaration for gweights[] to sort.h
add -s/-S to usage(), couple of formating nits
2001-01-12 19:41:12 +00:00
jdolecek
4a22141e02 the g/c in rev 1.12 was too aggressive - put back code
to change file '-' to '/dev/stdin'
2001-01-11 15:10:46 +00:00
jdolecek
1c216f18ea general cleanup of file list passing:
* get rid of union f_handle, replace by passing explicit int parameter
  and (new) struct filelist
* add new typedefs gen_func_t and put_func_t and use where appropriate
2001-01-11 14:05:24 +00:00
jdolecek
d3a4171066 make ftmp() wrapper aroung tmpfile(), there is no need to reimplement it
move ftmp() from tmp.c to files.c
g/c no longer needed stuff
2001-01-08 19:16:49 +00:00
jdolecek
09bc2d58e8 call setlocale() on startup
reformat the switch contents in main() a little, sort flags by alphabet
where possible
2001-01-08 18:58:56 +00:00