Each leading tab is converted to two tabs, and any leading four spaces
is converted to one tab. The intended tab size (for keeping most lines
within 80 columns) is now four.
Instead of always stepping back four bytes and then tentatively
moving forward again (which is wasteful when most codes are just
one or two bytes long), inspect the preceding bytes one by one
and begin the move forward at the first valid starter byte.
This reduces the backwards searching time by close to 40 percent.
If the length of the haystack is smaller than the length of the needle,
this means that also the length of the tail will be smaller -- because
pointer will be bigger than or equal to haystack -- so the pointer gets
readjusted to be a needle length before the end of the haystack, which
means that it ends up /before/ the haystack: thus the while loop will
never run.
On average, this saves some 200 nanoseconds per line.
The interval 2013-2017 for the Free Software Foundation is valid
because in those years there were releases with changes by either
Chris or David, and the GNU maintainers guide advises to mention
a new year in all files of a package, not just in the ones that
actually changed, and be done with it for the rest of the year.
The platform's default char type might be signed, which could cause
problems in 8-bit locales.
This addresses https://savannah.gnu.org/bugs/?50289.
Reported-by: Hans-Bernhard Broeker <HBBroeker@T-Online.de>
In path names and file names, 0x0A means an embedded newline and
should be shown as ^J, but in anything related to the file's data,
0x0A is an encoded NUL and should be displayed as ^@.
So... switch mode at the two main entry points into the "file system"
(reading in a file, and writing out a file), and also when drawing the
titlebar. Switch back to the default mode in the main loop.
This fixes https://savannah.gnu.org/bugs/?49893.
The byte 0x0A means 0x00 *only* when it is found in nano's internal
representation of a file's data, not when it occurs in a file name.
This fixes the second part of https://savannah.gnu.org/bugs/?49867.
That is: elide a second test from the most travelled path: a valid
character. This adds a second call of mblen() when parse_mbchar()
is called on a terminating zero, but that should never happen.
It is quicker to do a handful of superfluous compares at the end of
each line than it is to compute and keep track of and compare the
remaining line length the whole time.
The typical line is some sixty characters long, the typical search
string ten characters -- with a shorter search string the speedup is
even higher: some fifteen percent. Only when the string is longer
than half the average line length does searching become slower with
this new method.
All this for a UTF-8 locale. For a C locale it makes no difference.
Now that mbstrncasecmp() does the right thing, there is no need any
more to verify that only a valid multibyte sequence was matched.
(See https://savannah.gnu.org/bugs/?45579 for a test case.)
Also, this will make it possible to search for invalid sequences.
(Currently it isn't possible to enter a search string with invalid
characters, but... a user might edit the search history file. And
if pasting at the prompt is implemented, it will be trivial to enter
invalid sequences if you have a file that contains them.)
Persisting might lead to count 'n' reaching zero, which would mean that
the needle has matched, which is wrong when one of the strings contains
an invalid or incomplete multibyte sequence.
That is: don't run towlower() on the two differing bytes when having
reached the end of one of the strings.
This fixes https://savannah.gnu.org/bugs/?48700.
In the bargain, don't do the conversion to lowercase twice.
Furthermore, persist when encountering invalid byte sequences --
until finding bytes that differ.
The needle is never part of the hay -- it is always a separate string.
(And even if needle and haystack were identical, the routine works fine,
the case does not need special treatment.)