NetBSD

Commit Graph

Author	SHA1	Message	Date
joerg	2b8a053617	Retire __SCCSID. It has only archeological value now. Also retire lint conditional around __RCSID, lint can handle that fine.	2009-11-06 18:34:22 +00:00
dsl	43682b02ee	If anyone is stupid enough to feed records longer than 8MB into sort, don't sit in an infinite loop, instead eat memory until we have read 8 records.	2009-10-09 20:32:57 +00:00
dsl	41b3ada21c	When we need to merge more than 16 files, do them in a hierarchy. Reduces the amount of data written to temporary files. The 3-level stack has to do a simple reduce after 4352 input files, for a normal file sort this is 35GB of data or about 500 million records. This needs about 50 open fd's - which should be ok. Clearly the merge sort could process more input files in one go - speeding up the sort, but at some point the number of input files would exceed whatever limit was applied.	2009-10-09 20:29:43 +00:00
dsl	768e6fa973	Don't give merge an empty file when we detect EOF with nothing in our buffer.	2009-10-09 20:23:19 +00:00
dsl	8b6ec7b129	long align records written to temporary files.	2009-10-07 21:03:29 +00:00
dsl	5aa782f502	When encoding numbers, we can use all 8 bits for exponent values.	2009-10-07 21:02:57 +00:00
dsl	eab2f96cf5	Fix borked fix for sort relying on realloc() changing the buffer end. Sorts of more than 8MB data now probably work again.	2009-09-28 20:30:01 +00:00
dsl	6458ae9cdf	Move all the fopen() calls out of the record read routines into the callers. Split the merge sort so that fsort() can pass the 'FILE *' of the temporary files to be merged into the merge code. Don't rely on realloc() not moving the end address of a buffer! Rework merge sort so that it sorts pointers to 'struct mfile' and only copies about sort record descriptors. No functional change intended.	2009-09-26 21:16:55 +00:00
dsl	800732bfdc	Fix sort -u, PR/42094	2009-09-19 16:18:00 +00:00
dsl	fe52672374	Minor tweaks to the key generation for numeric fields. Use 1's compliment for -ve numbers to avoid confitionals.	2009-09-16 20:56:38 +00:00
dsl	1310aa04b4	Save length of key instead of relying of the weight of the record sep. This frees a byte value to use for 'end of key' (to correctly sort short keys) while still having a weight assigned to the field sep. (Unless -t is given, the field sep is in the field data.) Do reverse sorts by writing the output file in reverse order (rather than reversing the sort - apart from merges). All key compares are now unweighted. For 'sort -u' mark duplicates keys during the sort and don't write to the output. Use -S to mean a posix sort - where equal keys are sorted using the raw record (rather than being kept in the original order). For 'sort -f' (no keys) generate a key of the folded data (as for -n -i and -d), simplifies the code and allows a 'posix' sort.	2009-09-10 22:02:40 +00:00
dsl	2abdfb3907	Now we have our own radix_sort() change the interface so that we pass an array of 'RECHEADER *' and remove all the crappy stuff that backed up by REC_DATA_OFFSET (etc). Also change radix_sort() to return the number of elements, soon to be used to drop duplicate keys (for sort -u).	2009-09-05 12:00:25 +00:00
dsl	4611f32c1c	Include a local copy of the sradixsort() code from libc. Currently unchanged apart from the deletion of the 'unstable' version and other unneeded code. Use fldtab[0]. not fldtab-> when we are referring to the global info in the 0th entry to emphasise that this entry is different. fldtab[0].weights is only needed in the SINGL_FLD case - so set it there. Re-indent a big 'if' is setfield() so that the line breaks match the logic - which looks dubious now!	2009-09-05 09:16:18 +00:00
wiz	ea72fa6ee9	Fix pasto.	2009-08-23 15:45:08 +00:00
dsl	5166e91c70	Bring nearer to reality. Note that -H is now ignored. Move -S and -s (and -H) to the first list of options since they are global ones, not ones that override the ordering rules.	2009-08-22 21:55:08 +00:00
dsl	5c6e557c4b	<space> and <tab> at the start of key fields are supposed to be sorted as if part of the data. This is a bit fubar since we need a value than sorts before any byte value as a key field separator - so need 257 byte values (since radixsort() doesn't take a length for each record). For now map '\t' to 0x01 and hope no one will notice!	2009-08-22 21:50:32 +00:00
dsl	e0846c3698	Put radixsort() and sradixsort() the correct way around.	2009-08-22 21:43:53 +00:00
dsl	f58fe5e68a	Fix generation of unmasked alpha keys.	2009-08-22 21:28:55 +00:00
dsl	b36440a064	Only process each number digit once.	2009-08-22 21:19:40 +00:00
dsl	609b8532b4	Add some comments and clarifications to this inpeneterable code. When merging ensure we accurable sort records with identical keys by file-number, otherwise a 'stable' sort won't be!	2009-08-22 15:16:50 +00:00
dsl	7b4a02befd	Rework the way sort generates sort keys: - If we generate a key, it is always sortable using memcmp() - If we are sorting the whole record, then a weight-table must be used during compares. - Major surgery to encoding of numbers to ensure unique keys for equal numeric values. Reverse numerics are handled by inverting the sign. - Case folding (-f) is handled when the sort keys are generated. No other code has to care at all. - Key uniqueness (-u) is done during merge for large datasets. It only has to be done when writing the output file for small files. Since the file is in key order this is simple! Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504 PR/36816 PR/37860 PR/39308 Also PR/18614 should no longer die, but a little more work needs to be done on the merging for very large files.	2009-08-22 10:53:28 +00:00
dsl	bf80c84843	Delete more unwanted/unused cruft. Simplify logic for reading input records. Do a merge sort whenever we have 16 partial sorted blocks. The patient is breathing, but still carrying a lot of extra weight.	2009-08-20 06:36:25 +00:00
dsl	f155f3b8b9	The code that attempted to sort large files by sorting each chunk by the first key byte and writing to a temp file, then sorting the records from each temp file that had the same first key byte (and repeating for upto 4 key bytes) was a nice idea, but completely doomed to failure. Eg PR/9308 where a 70MB file has all but one record the same and short keys. Not only does the code not work, it is rather guaranteed to be slow. Instead always use a merge sort for fully sorted chunk of records (each temporary file contains one lot of sorted records). The -H option already did this, so just rip out all the code and variables that can't be used when -H was specified. Further cleanup to come ...	2009-08-18 18:00:28 +00:00
dsl	fa81e78b3d	'depth' is used for the number of bytes into the key that the pointers reference, when we want to find the record header put the larger value into 'hdr_off' to avoid any confusion that the code might be changing 'depth'! There is now no need to save the original value as 'odepth' in append.c. All an a vague attempt to make this code slightly readable.	2009-08-16 20:02:04 +00:00
dsl	9ab8b68075	Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which is defined as offsetof(RECHEADER, data). Delete TRECHEADER.	2009-08-16 19:53:43 +00:00
dsl	59ede5ae24	Always add an REC_D char (usually \n) as the last sort key char - we almost always need one. But do ADD it, instead of overwriting the last byte of the last key since that may be requesting the other end of the sort order. There is no need to check for space for the line after adding the key, but we might as well check before - just to optimise that case. This might fix some of the sort bugs - but not the one I'm looking at!	2009-08-15 21:26:32 +00:00
dsl	9987745061	Remove reference to db.h by using separate ptr+len fields for the only structure that used it. Pass end of keybuf area, not size to enterkey() - largely to remove a variable who'se use isn't obvious from the name! The structute of this code sucks.	2009-08-15 18:40:01 +00:00
dsl	477a33f936	linebuf and linebuf_size are only used inside seq() - which also not only has its own static variable, but will also extend the buffer. Remove linebuf/size and change seq() to use a private, locally managed buffer.	2009-08-15 16:50:29 +00:00
dsl	5e8c7b5dbd	Remove the unused 'DBT *key' parameter from seq().	2009-08-15 16:10:40 +00:00
dsl	a3b5c4400f	In makeline() change 'pos' from 'char ' to 'u_char ' and remove all the casts associated with its use. None of the uses can possibly care about the signedness of the pointer.	2009-08-15 14:31:48 +00:00
dsl	2a0ab276a2	Ansify. I'm looking at fixing the 'sort -n' fubars, but this code is an inpeneterable mess - which needs some fixing first!	2009-08-15 09:48:46 +00:00
lukem	c1ceae17f0	Enable WARNS=4 by default for usr.bin, except for: awk bdes checknr compile_et error gss hxtool kgetcred kinit klist ldd less lex locale login m4 man menuc mk_cmds mklocale msgc openssl rpcgen rpcinfo sdiff spell ssh string2key telnet tn3270 verify_krb5_conf xlint	2009-04-14 22:15:16 +00:00
lukem	64d3192b1d	Fix WARNS=4 issues (-Wcast-qual -Wsign-compare)	2009-04-13 11:07:59 +00:00
joerg	8929e0dce4	Don't workaround ancient macro argument limit with .Xo/.Xc.	2009-03-11 13:58:29 +00:00
christos	079a9a0235	Make -R accept numeric arguments so one can say -R '\0' to be used in pipelines like find . -print0 \| sort -R '\0'. From Anon Ymous	2008-11-08 17:11:56 +00:00
lukem	98e5374ccb	Remove the \n and tabs from the __COPYRIGHT() strings. Tweak to use a consistent format.	2008-07-21 14:19:20 +00:00
martin	cd22f25e6f	Move TNF licenses to 2 clause form	2008-05-02 18:11:04 +00:00
martin	ce099b4099	Remove clause 3 and 4 from TNF licenses	2008-04-28 20:22:51 +00:00
hubertf	f2799c52e5	<ctype.h> is unused. What's still needed is <sys/cdefs.h> (which is usually included at that place anyways). From Slava Semushin <slava.semushin@gmail.com>.	2007-02-21 20:15:17 +00:00
jdolecek	d1de60425b	fix check for field order to allow .0 form in "-k 1.2,1.0" fix provided in PR bin/25572 by Ross Patterson	2006-10-23 20:36:17 +00:00
jdolecek	a2e8970e19	when using -o into file which already exists, copy the permissions of the original file to the new (sorted) file adresses PR bin/26860 by Michael van Elst	2006-10-23 19:53:25 +00:00
jdolecek	bfa086e40a	replace access(2) + /dev/ prefix check with lstat(2) and S_ISCHR()/S_ISBLK() part of PR bin/26860 by Michael van Elst while here, put output file fopen() inside the code block of the only code path where it's actually needed, to make the logic more obvious; and in the "stdout" case, initialize toutpath to empty string rather then /dev/stdout, to make it clear /dev/stdout is not actually used	2006-10-23 19:39:54 +00:00
jdolecek	165c6691e1	use F_OK instead of 0 for second parameter of access(2) part of PR bin/26860 by Michael van Elst	2006-10-23 19:11:46 +00:00
mrg	f066626ffb	char -> u_char in a couple of places to match other variables.	2006-05-11 19:16:42 +00:00
jmc	e71965e518	Init some variables the compiler is complaining about and mark w. XXGCC as it affects only m68k compilers.	2005-06-10 16:07:45 +00:00
he	90d4762740	Initialize a local variable to appease -Wuninitialized. Marked with XXXGCC for pmppc (found while compiling for it). Reviewed by lukem.	2005-06-07 09:51:34 +00:00
dsl	e219d781d7	Add (unsigned char) cast to ctype functions	2004-11-03 20:10:08 +00:00
wiz	1919d4e949	Sync usage with man page. From Kouichirou Hiratsuka in PR 26278.	2004-07-23 13:26:11 +00:00
wiz	2389facffb	Sort options in SYNOPSIS. From Kouichirou Hiratsuka in PR 26278.	2004-07-23 13:20:36 +00:00
heas	2a3d05aa4e	Do not step over the edge of the buffer (check for '\0'). This just happens to not lose on i386 because another buffer appears immediately following. Regress tests all passed.	2004-03-14 21:12:14 +00:00

1 2 3 4

180 Commits