Import nawk as of 2003/07/29
Changes: * internationalization improvements * [:digit:] addition * some bugfixes
This commit is contained in:
parent
9d1ca6d8d9
commit
4ea2a427d1
136
dist/nawk/FIXES
vendored
136
dist/nawk/FIXES
vendored
@ -25,6 +25,142 @@ THIS SOFTWARE.
|
||||
This file lists all bug fixes, changes, etc., made since the AWK book
|
||||
was sent to the printers in August, 1987.
|
||||
|
||||
Jul 29, 2003:
|
||||
fixed (i think) the long-standing botch that included the beginning of
|
||||
line state ^ for RE's in the set of valid characters; this led to a
|
||||
variety of odd problems, including failure to properly match certain
|
||||
regular expressions in non-US locales. thanks to ruslan for keeping
|
||||
at this one.
|
||||
|
||||
Jul 28, 2003:
|
||||
n-th try at getting internationalization right, with thanks to volker
|
||||
kiefel, arnold robbins and ruslan ermilov for advice, though they
|
||||
should not be blamed for the outcome. according to posix, "." is the
|
||||
radix character in programs and command line arguments regardless of
|
||||
the locale; otherwise, the locale should prevail for input and output
|
||||
of numbers. so it's intended to work that way.
|
||||
|
||||
i have rescinded the attempt to use strcoll in expanding shorthands in
|
||||
regular expressions (cclenter). its properties are much too
|
||||
surprising; for example [a-c] matches aAbBc in locale en_US but abBcC
|
||||
in locale fr_CA. i can see how this might arise by implementation
|
||||
but i cannot explain it to a human user. (this behavior can be seen
|
||||
in gawk as well; we're leaning on the same library.)
|
||||
|
||||
the issue appears to be that strcoll is meant for sorting, where
|
||||
merging upper and lower case may make sense (though note that unix
|
||||
sort does not do this by default either). it is not appropriate
|
||||
for regular expressions, where the goal is to match specific
|
||||
patterns of characters. in any case, the notations [:lower:], etc.,
|
||||
are available in awk, and they are more likely to work correctly in
|
||||
most locales.
|
||||
|
||||
a moratorium is hereby declared on internationalization changes.
|
||||
i apologize to friends and colleagues in other parts of the world.
|
||||
i would truly like to get this "right", but i don't know what
|
||||
that is, and i do not want to keep making changes until it's clear.
|
||||
|
||||
Jul 4, 2003:
|
||||
fixed bug that permitted non-terminated RE, as in "awk /x".
|
||||
|
||||
Jun 1, 2003:
|
||||
subtle change to split: if source is empty, number of elems
|
||||
is always 0 and the array is not set.
|
||||
|
||||
Mar 21, 2003:
|
||||
added some parens to isblank, in another attempt to make things
|
||||
internationally portable.
|
||||
|
||||
Mar 14, 2003:
|
||||
the internationalization changes, somewhat modified, are now
|
||||
reinstated. in theory awk will now do character comparisons
|
||||
and case conversions in national language, but "." will always
|
||||
be the decimal point separator on input and output regardless
|
||||
of national language. isblank(){} has an #ifndef.
|
||||
|
||||
this no longer compiles on windows: LC_MESSAGES isn't defined
|
||||
in vc6++.
|
||||
|
||||
fixed subtle behavior in field and record splitting: if FS is
|
||||
a single character and RS is not empty, \n is NOT a separator.
|
||||
this tortuous reading is found in the awk book; behavior now
|
||||
matches gawk and mawk.
|
||||
|
||||
Dec 13, 2002:
|
||||
for the moment, the internationalization changes of nov 29 are
|
||||
rolled back -- programs like x = 1.2 don't work in some locales,
|
||||
because the parser is expecting x = 1,2. until i understand this
|
||||
better, this will have to wait.
|
||||
|
||||
Nov 29, 2002:
|
||||
modified b.c (with tiny changes in main and run) to support
|
||||
locales, using strcoll and iswhatever tests for posix character
|
||||
classes. thanks to ruslan ermilov (ru@freebsd.org) for code.
|
||||
the function isblank doesn't seem to have propagated to any
|
||||
header file near me, so it's there explicitly. not properly
|
||||
tested on non-ascii character sets by me.
|
||||
|
||||
Jun 28, 2002:
|
||||
modified run/format() and tran/getsval() to do a slightly better
|
||||
job on using OFMT for output from print and CONVFMT for other
|
||||
number->string conversions, as promised by posix and done by
|
||||
gawk and mawk. there are still places where it doesn't work
|
||||
right if CONVFMT is changed; by then the STR attribute of the
|
||||
variable has been irrevocably set. thanks to arnold robbins for
|
||||
code and examples.
|
||||
|
||||
fixed subtle bug in format that could get core dump. thanks to
|
||||
Jaromir Dolecek <jdolecek@NetBSD.org> for finding and fixing.
|
||||
minor cleanup in run.c / format() at the same time.
|
||||
|
||||
added some tests for null pointers to debugging printf's, which
|
||||
were never intended for external consumption. thanks to dave
|
||||
kerns (dkerns@lucent.com) for pointing this out.
|
||||
|
||||
GNU compatibility: an empty regexp matches anything (thanks to
|
||||
dag-erling smorgrav, des@ofug.org). subject to reversion if
|
||||
this does more harm than good.
|
||||
|
||||
pervasive small changes to make things more const-correct, as
|
||||
reported by gcc's -Wwrite-strings. as it says in the gcc manual,
|
||||
this may be more nuisance than useful. provoked by a suggestion
|
||||
and code from arnaud desitter, arnaud@nimbus.geog.ox.ac.uk
|
||||
|
||||
minor documentation changes to note that this now compiles out
|
||||
of the box on Mac OS X.
|
||||
|
||||
Feb 10, 2002:
|
||||
changed types in posix chars structure to quiet solaris cc.
|
||||
|
||||
Jan 1, 2002:
|
||||
fflush() or fflush("") flushes all files and pipes.
|
||||
|
||||
length(arrayname) returns number of elements; thanks to
|
||||
arnold robbins for suggestion.
|
||||
|
||||
added a makefile.win to make it easier to build on windows.
|
||||
based on dan allen's buildwin.bat.
|
||||
|
||||
Nov 16, 2001:
|
||||
added support for posix character class names like [:digit:],
|
||||
which are not exactly shorter than [0-9] and perhaps no more
|
||||
portable. thanks to dag-erling smorgrav for code.
|
||||
|
||||
Feb 16, 2001:
|
||||
removed -m option; no longer needed, and it was actually
|
||||
broken (noted thanks to volker kiefel).
|
||||
|
||||
Feb 10, 2001:
|
||||
fixed an appalling bug in gettok: any sequence of digits, +,-, E, e,
|
||||
and period was accepted as a valid number if it started with a period.
|
||||
this would never have happened with the lex version.
|
||||
|
||||
other 1-character botches, now fixed, include a bare $ and a
|
||||
bare " at the end of the input.
|
||||
|
||||
Feb 7, 2001:
|
||||
more (const char *) casts in b.c and tran.c to silence warnings.
|
||||
|
||||
Nov 15, 2000:
|
||||
fixed a bug introduced in august 1997 that caused expressions
|
||||
like $f[1] to be syntax errors. thanks to arnold robbins for
|
||||
|
Loading…
Reference in New Issue
Block a user