Szabolcs Nagy ec1aed0a14 rewrite the regex pattern parser in regcomp
The new code is a bit simpler and the generated code is about 1KB
smaller (on i386). The basic design was kept including internal
interfaces, TNFA generation was not touched.

The old tre parser had various issues:

[^aa-z]
negated overlapping ranges in a bracket expression were handled
incorrectly (eg [^aa-z] was handled as [^a] instead of [^a-z])

a{,2}
missing lower bound in a counted repetition should be an error,
but it was accepted with broken semantics: a{,2} was treated as
a{0,3}, the new parser rejects it

a{999,}
large min count was not rejected (a{5000,} failed with REG_ESPACE
due to reaching a stack limit), the new parser enforces the
RE_DUP_MAX limit

\xff
regcomp used to accept a pattern with illegal sequences in it
(treated them as empty expression so p\xffq matched pq) the new
parser rejects such patterns with REG_BADPAT or REG_ERANGE

[^b-fD-H] with REG_ICASE
old parser turned this into [^b-fB-F] because of the negated
overlapping range issue (see above), the new parser treats it
as [^b-hB-H], POSIX seems to require [^d-fD-F], but practical
implementations do case-folding first and negate the character
set later instead of the other way around. (Supporting the posix
way efficiently would require significant changes so it was left
as is, it is unclear if any application actually expects the
posix behaviour, this issue is raised on the austingroup tracker:
http://austingroupbugs.net/view.php?id=872 ).

another case-insensitive matching issue is that unicode case
folding rules can group more than two characters together while
towupper and towlower can only work for a pair of upper and
lower case characters, this is a limitation of POSIX so it is
not fixed.

invalid bracket and brace expressions may return different error
codes now (REG_ERANGE instead of REG_EBRACK or REG_BADBR instead
of REG_EBRACE) otherwise the new parser should be compatible with
the old one.

regcomp should be able to handle arbitrary pattern input if the
pattern length is limited, the only exception is the use of large
repetition counts (eg. (a{255}){255}) which require exp amount
of memory and there is no easy workaround.
2014-09-13 00:20:55 +02:00
2014-07-31 19:10:31 -04:00
2014-07-31 19:10:31 -04:00

    musl libc

musl, pronounced like the word "mussel", is an MIT-licensed
implementation of the standard C library targetting the Linux syscall
API, suitable for use in a wide range of deployment environments. musl
offers efficient static and dynamic linking support, lightweight code
and low runtime overhead, strong fail-safe guarantees under correct
usage, and correctness in the sense of standards conformance and
safety. musl is built on the principle that these goals are best
achieved through simple code that is easy to understand and maintain.

The 1.1 release series for musl features coverage for all interfaces
defined in ISO C99 and POSIX 2008 base, along with a number of
non-standardized interfaces for compatibility with Linux, BSD, and
glibc functionality.

For basic installation instructions, see the included INSTALL file.
Information on full musl-targeted compiler toolchains, system
bootstrapping, and Linux distributions built on musl can be found on
the project website:

    http://www.musl-libc.org/
Description
No description provided
Readme 7.5 MiB
Languages
C 93.1%
Assembly 4.8%
C++ 1.3%
Awk 0.4%
Makefile 0.3%
Other 0.1%