Commit Graph

10539 Commits

Author SHA1 Message Date
dsl
5c6e557c4b <space> and <tab> at the start of key fields are supposed to be sorted
as if part of the data.
This is a bit fubar since we need a value than sorts before any byte value
as a key field separator - so need 257 byte values (since radixsort() doesn't
take a length for each record).
For now map '\t' to 0x01 and hope no one will notice!
2009-08-22 21:50:32 +00:00
dsl
e0846c3698 Put radixsort() and sradixsort() the correct way around. 2009-08-22 21:43:53 +00:00
dsl
f58fe5e68a Fix generation of unmasked alpha keys. 2009-08-22 21:28:55 +00:00
dsl
b36440a064 Only process each number digit once. 2009-08-22 21:19:40 +00:00
joerg
976b948d1c GCC's propolice complains about dynamic stack arrays to bite the bullet
and introduce a compile constant that limits the number of hash results.
Verify that the choosen hash function is not beyond that limit and just
the upper limit as static size in the graph tree functions.
2009-08-22 17:52:17 +00:00
joerg
44b11f39e8 Add support for -c, make the output of -l/-v more similar to infozip. 2009-08-22 17:19:11 +00:00
dsl
609b8532b4 Add some comments and clarifications to this inpeneterable code.
When merging ensure we accurable sort records with identical keys by
file-number, otherwise a 'stable' sort won't be!
2009-08-22 15:16:50 +00:00
dsl
7b4a02befd Rework the way sort generates sort keys:
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
  during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
  numeric values.  Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
  code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
  has to be done when writing the output file for small files.
  Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
2009-08-22 10:53:28 +00:00
mrg
a9208fb155 kill ldd_aout. it didn't work anyway...not since i don't know when. 2009-08-22 06:52:15 +00:00
joerg
5e2ef53f3e Add -p and -q support. 2009-08-22 02:19:42 +00:00
joerg
de2e5c0dbd Fix markup. 2009-08-22 00:23:02 +00:00
manu
3beea69ccd Reset ziptype on each line. Failure to do this caused any log file to
be compressed if it was listed after a line using Z or J flag. For
instance, we compressed log2 with the config file below:
/var/log/log1                        600  5    100  *    Z
/var/log/log2                        600  7    100  *    -
2009-08-21 08:20:19 +00:00
he
3b9a95def3 Um, the test for mips should use MACHINE_CPU, not MACHINE_ARCH. 2009-08-20 21:07:47 +00:00
he
c93d22967f Don't try to call the (no longer defined) aout_ldd() function
if we're building for mips.
2009-08-20 21:06:17 +00:00
he
b233b36efa Don't include <a.out.h> unless it's needed, and don't build
the aout subdir if on mips.  Fixes build for mips ports.
2009-08-20 19:17:19 +00:00
he
d1c69ed983 Move the include of <a.out.h> and <sys/exec_aout.h> until after
we have determined that the current machine actually supports a.out.
2009-08-20 17:40:26 +00:00
he
cff5554191 Remove the include of <a.out.h>, since these files do not appear
to need it.  Fixes build for mips ports.
Also fix a comment: crunchide walks "the symbol table", not
just "the a.out symbol table".
2009-08-20 17:39:51 +00:00
plunky
46834460fc add SupportedRepositories attribute for Phonebook Access profile
providing PhonebookAccessServer service
2009-08-20 11:07:42 +00:00
dsl
bf80c84843 Delete more unwanted/unused cruft.
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
2009-08-20 06:36:25 +00:00
joerg
d1a45441e8 Fix-up syntax after wizd. 2009-08-19 15:26:59 +00:00
joerg
dd0c11eb26 Don't use .Xo/.Xc. Fix markup of alias command. 2009-08-19 15:17:05 +00:00
joerg
d965d85297 Nesting displays is not valid groff syntax. 2009-08-19 14:54:35 +00:00
dsl
f155f3b8b9 The code that attempted to sort large files by sorting each chunk by the
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
2009-08-18 18:00:28 +00:00
joerg
81e49626c4 GCC doesn't trace switch (foo & 7) completely, so add a default: abort()
to avoid warnings about unused variables.
Consistently use \t for the output function.
2009-08-17 14:15:07 +00:00
christos
3104786862 back out previous; luke says:
'@' is a reserved URI char per RFC 3986, use %40
2009-08-17 09:08:16 +00:00
dsl
fa81e78b3d 'depth' is used for the number of bytes into the key that the pointers
reference, when we want to find the record header put the larger value
into 'hdr_off' to avoid any confusion that the code might be changing
'depth'!
There is now no need to save the original value as 'odepth' in append.c.
All an a vague attempt to make this code slightly readable.
2009-08-16 20:02:04 +00:00
dsl
9ab8b68075 Replace all uses of sizeof(TRECHEADER) with REC_DATA_OFFSET - which
is defined as offsetof(RECHEADER, data).  Delete TRECHEADER.
2009-08-16 19:53:43 +00:00
martin
f842f249ac More missing <sys/exec_aout.h> 2009-08-16 18:43:08 +00:00
martin
73946a9cc9 More missing <sys/exec_aout.h> includes 2009-08-16 18:15:28 +00:00
christos
3ac3892a2a use strrchr to find the last @ because we might want the username to contain
user@domain.
2009-08-16 02:49:23 +00:00
dsl
59ede5ae24 Always add an REC_D char (usually \n) as the last sort key char - we
almost always need one.
But do ADD it, instead of overwriting the last byte of the last key since
that may be requesting the other end of the sort order.
There is no need to check for space for the line after adding the key,
but we might as well check before - just to optimise that case.
This might fix some of the sort bugs - but not the one I'm looking at!
2009-08-15 21:26:32 +00:00
wiz
e671c9cf08 Fix typo. 2009-08-15 20:44:56 +00:00
christos
bcbc23bd8f add -p <tmpdir> option to override $TMPDIR from the command line like linux
has.
2009-08-15 20:02:28 +00:00
dsl
9987745061 Remove reference to db.h by using separate ptr+len fields for the only
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
2009-08-15 18:40:01 +00:00
dsl
477a33f936 linebuf and linebuf_size are only used inside seq() - which also not
only has its own static variable, but will also extend the buffer.
Remove linebuf/size and change seq() to use a private, locally managed
buffer.
2009-08-15 16:50:29 +00:00
joerg
03c8ba1c27 Add nbperf(1), a minimal perfect hash function generator.
Implemented are the 3-graph BDZ algorithm as well as the
2-graph and 3-graph CHM algorithms. All algorithms have expected
linear run time and the smallest functions need around 2.85 bit/key.
2009-08-15 16:21:04 +00:00
dsl
5e8c7b5dbd Remove the unused 'DBT *key' parameter from seq(). 2009-08-15 16:10:40 +00:00
dsl
a3b5c4400f In makeline() change 'pos' from 'char *' to 'u_char *' and remove all
the casts associated with its use.
None of the uses can possibly care about the signedness of the pointer.
2009-08-15 14:31:48 +00:00
dsl
2a0ab276a2 Ansify.
I'm looking at fixing the 'sort -n' fubars, but this code is an
inpeneterable mess - which needs some fixing first!
2009-08-15 09:48:46 +00:00
drochner
c2819fbfcf avoid NULL dereference in log output if the command line parser
failed to extract a port number from the URL
2009-08-13 17:55:18 +00:00
dholland
69c3e9d213 Pass WARNS=4, not without some gross preprocessor hackery.
XXX: does this program actually do anything useful these days?
2009-08-13 06:59:37 +00:00
dholland
e63a3e7105 Assorted minor cleanup:
- use stdbool.h (partly)
  - move extern declarations of data to header files
  - use right types for calloc() wrapper
  - remove bogus casts on return values
  - remove excessive Pascal-style parentheses in conditionals
  - a couple const fixes
  - fix some typos in comments
2009-08-13 05:53:58 +00:00
dholland
7d59a3fee1 pass -Wshadow 2009-08-13 04:09:53 +00:00
dholland
72efe4fb6f Sprinkle const. 2009-08-13 03:50:02 +00:00
dholland
6c23c8ddec woops (doh!) 2009-08-13 03:10:03 +00:00
dholland
f42113e362 Whitespace. 2009-08-13 03:07:49 +00:00
dholland
b74af21b24 sprinkle static 2009-08-13 02:10:50 +00:00
matt
5f89c92891 Don't build for MIPS anymore 2009-08-12 23:39:13 +00:00
wiz
67bd9cb78f Remove superfluous parenthesis in #ifdef DEBUG.
From Henning Petersen in PR 41844.
2009-08-07 14:05:58 +00:00
wiz
ba544bf010 Add missing parenthesis in commented out code.
From Henning Petersen in PR 41838.
2009-08-07 13:53:54 +00:00