Reduces the amount of data written to temporary files.
The 3-level stack has to do a simple reduce after 4352 input files, for
a normal file sort this is 35GB of data or about 500 million records.
This needs about 50 open fd's - which should be ok.
Clearly the merge sort could process more input files in one go - speeding
up the sort, but at some point the number of input files would exceed
whatever limit was applied.
for substitution patterns. This (perhaps coupled with the
new handling of .for variables in ${:U<value>...) caused interesting
results for lines like:
.for file in ${LIST}
for-subst: ${file:S;^;${here}/;g}
add a unit-test to keep an eye on this.
Split the merge sort so that fsort() can pass the 'FILE *' of the temporary
files to be merged into the merge code.
Don't rely on realloc() not moving the end address of a buffer!
Rework merge sort so that it sorts pointers to 'struct mfile' and only
copies about sort record descriptors.
No functional change intended.
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.
This work was part of my 2009 GSoC
No objection on tech-net@
This commit mostly adds code written by Claudio Jeker for OpenBSD to
support sysctl in the interface printing parts (-i, -I, -w). The port has
been ported to NetBSD with tiny adjustments -- of course all bugs etc.
are mine.
Also add and document a -X flag to force sysctl usage. The documentation
notes this flag may be removed at any time and its presence should not be
relied on.
Some misc. comments/#ifdef changes/code snippet moves as well.
Please note that no functionality should change as the routing and
interface printing code is still not fully supported.
Mailing list reference:
http://mail-index.netbsd.org/tech-userlevel/2009/09/09/msg002604.html
This frees a byte value to use for 'end of key' (to correctly sort
short keys) while still having a weight assigned to the field sep.
(Unless -t is given, the field sep is in the field data.)
Do reverse sorts by writing the output file in reverse order (rather
than reversing the sort - apart from merges).
All key compares are now unweighted.
For 'sort -u' mark duplicates keys during the sort and don't write
to the output.
Use -S to mean a posix sort - where equal keys are sorted using the
raw record (rather than being kept in the original order).
For 'sort -f' (no keys) generate a key of the folded data (as for -n
-i and -d), simplifies the code and allows a 'posix' sort.
Use .MAKE.LEVEL to track recursion.
The first instance of make will have .MAKE.LEVEL 0, which
can be handy for excluding rules which should not apply
in a sub-make.
gmake and freebsd's make have a similar mechanism, but each
uses a different variable to track it. Since we cannot be
compatible with both, we allow the makefiles to cope if they want
by handling the export of .MAKE.LEVEL+1 in Var_Set().
an array of 'RECHEADER *' and remove all the crappy stuff that backed up
by REC_DATA_OFFSET (etc).
Also change radix_sort() to return the number of elements, soon to be used
to drop duplicate keys (for sort -u).
Currently unchanged apart from the deletion of the 'unstable' version and
other unneeded code.
Use fldtab[0]. not fldtab-> when we are referring to the global info
in the 0th entry to emphasise that this entry is different.
fldtab[0].weights is only needed in the SINGL_FLD case - so set it there.
Re-indent a big 'if' is setfield() so that the line breaks match the
logic - which looks dubious now!
Note that -H is now ignored.
Move -S and -s (and -H) to the first list of options since they are
global ones, not ones that override the ordering rules.
as if part of the data.
This is a bit fubar since we need a value than sorts before any byte value
as a key field separator - so need 257 byte values (since radixsort() doesn't
take a length for each record).
For now map '\t' to 0x01 and hope no one will notice!
and introduce a compile constant that limits the number of hash results.
Verify that the choosen hash function is not beyond that limit and just
the upper limit as static size in the graph tree functions.
- If we generate a key, it is always sortable using memcmp()
- If we are sorting the whole record, then a weight-table must be used
during compares.
- Major surgery to encoding of numbers to ensure unique keys for equal
numeric values. Reverse numerics are handled by inverting the sign.
- Case folding (-f) is handled when the sort keys are generated. No other
code has to care at all.
- Key uniqueness (-u) is done during merge for large datasets. It only
has to be done when writing the output file for small files.
Since the file is in key order this is simple!
Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
PR/36816 PR/37860 PR/39308
Also PR/18614 should no longer die, but a little more work needs to be
done on the merging for very large files.
be compressed if it was listed after a line using Z or J flag. For
instance, we compressed log2 with the config file below:
/var/log/log1 600 5 100 * Z
/var/log/log2 600 7 100 * -
Simplify logic for reading input records.
Do a merge sort whenever we have 16 partial sorted blocks.
The patient is breathing, but still carrying a lot of extra weight.
first key byte and writing to a temp file, then sorting the records from
each temp file that had the same first key byte (and repeating for upto
4 key bytes) was a nice idea, but completely doomed to failure.
Eg PR/9308 where a 70MB file has all but one record the same and short keys.
Not only does the code not work, it is rather guaranteed to be slow.
Instead always use a merge sort for fully sorted chunk of records (each
temporary file contains one lot of sorted records).
The -H option already did this, so just rip out all the code and variables
that can't be used when -H was specified.
Further cleanup to come ...
reference, when we want to find the record header put the larger value
into 'hdr_off' to avoid any confusion that the code might be changing
'depth'!
There is now no need to save the original value as 'odepth' in append.c.
All an a vague attempt to make this code slightly readable.
almost always need one.
But do ADD it, instead of overwriting the last byte of the last key since
that may be requesting the other end of the sort order.
There is no need to check for space for the line after adding the key,
but we might as well check before - just to optimise that case.
This might fix some of the sort bugs - but not the one I'm looking at!
structure that used it.
Pass end of keybuf area, not size to enterkey() - largely to remove a
variable who'se use isn't obvious from the name!
The structute of this code sucks.
Implemented are the 3-graph BDZ algorithm as well as the
2-graph and 3-graph CHM algorithms. All algorithms have expected
linear run time and the smallest functions need around 2.85 bit/key.
- use stdbool.h (partly)
- move extern declarations of data to header files
- use right types for calloc() wrapper
- remove bogus casts on return values
- remove excessive Pascal-style parentheses in conditionals
- a couple const fixes
- fix some typos in comments
file on the name of the target file, not just the target directory, to
ensure uniqueness when multiple concurrent invocations of install(1)
simultaneously install files in the same directory. Fixes bin/41512.
across setjmp/longjmp.
Inspired by PR 41255 from Kurt Lidl, but this change makes "com" a
volatile pointer to const non-volatile data, whereas the PR made it a
non-volatile pointer to const volatile data.