In many places a carriage return is not valid whitespace and should
thus not be colored as such. In some of these places a vertical tab
or form feed is maybe valid whitespace, but it would be ugly or even
wrong to color them because they are not part of the subsequent
comment or keyword.
This fixes https://savannah.gnu.org/bugs/?60456.
Changes:
1. There may be zero spaces between 'include' and '<...>'.
2. Blanks and '=' may occur inside '<...>' but '>' may not.
3. There must be at least one character inside '<...>'.
References:
Change 1:
C: www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#subsection.6.10.2
C++: www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf#section.19.2
Changes 2 and 3:
C: www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf#subsection.6.4.7
C++: www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf#section.5.8
Signed-off-by: Hussam al-Homsi <sawuare@gmail.com>
Since commits b0209374 and 1c010d8e from a month ago, nano does not use
mblen() and mbtowc() any more, so there is no need to check for their
presence.
Instead, add a check for iswalpha(), which we do use.
This reduces startup time by seven percent (when using the standard set
of syntaxes) when opening just one file that doesn't match any syntax,
and more than ten percent when opening multiple files. It takes some
extra memory, but... not wasting CPU cycles is more important.
This addresses https://savannah.gnu.org/bugs/?56433.
When undoing several actions, it is possible for the line at the top
of the screen to be removed, leaving 'edittop' pointing to a structure
that has been freed. Soon after, 'edittop' is referenced to determine
whether the cursor is offscreen... Prevent this invalid reference by
stepping 'edittop' one line back in that special case. This changes
the normal centering behavior of Undo when the cursor goes offscreen,
but... so be it.
When a single node is deleted, it is always possible to step one line
back, because a buffer contains always at least one line (even though
maybe empty), so if the current line could be deleted, there must be
one before it (when at the top of the screen).
This fixes https://savannah.gnu.org/bugs/?60436.
Bug existed since version 2.3.3, commit 60815461,
since undoing does not always center the cursor.
Since two commits ago, the position of the indicator shows the position
of the viewport relative to the full buffer in terms of actual lines,
not of visual chunks (to avoid excessive computation). But the size of
the indicator stayed constant, as if it always covered as many lines as
the edit window has rows. But the latter will not be the case when
softwrapping occurs. Therefore, when softwrapping, compute how many
actual lines are visible in the viewport, and adjust the size of the
indicator accordingly.
Whenever softwrap was toggled on or line numbers were toggled on/off or
the window was resized, the extra rows per line needed to be recomputed
for ALL the lines. For large files with many long lines this was too
costly.
(This change causes the indicator to have an incorrect size when there
are many softwrapped chunks onscreen, but that will be addressed later.)
This fixes https://savannah.gnu.org/bugs/?60429.
Problem existed since version 5.0, since the indicator was introduced.
This makes the handling of plain ASCII a tiny bit slower, but it
affects only the users of --constantshow without --minibar, so...
All other uses of mbstrlen() and collect_char() are not in speed-
critical code paths.
Since the previous commit, mbwidth() is used only to determine whether
a character is either double width or zero width. There is no need to
return the actual width of the character; a simple yes or no is enough.
Transforming mbwidth() into is_doublewidth() also allows streamlining
it and is_zerowidth() a bit, so that they become slightly faster.
The number of bytes in the character were determined twice: first in
mbwidth() and then in char_length(). Do it just once, in mbtowide().
Also, avoid calling is_cntrl_char(), because it does unneeded checks
when we already know that the high bit is set.
This duplicates some code, but advance_over() is called a lot, so it
is important that it is as fast as possible.
This shouldn't slow down plain ASCII, as the extra checks (use_utf8
and *string < 0xA0) are done only for non-ASCII (apart from DEL).
The 'start_index' was in index in the given text, while 'index' is an
index in the displayable string. Having both of them using 'index' in
their name was somewhat confusing.
For a normal file (without overlong lines) the strlen() wasn't much
of a problem. But when there are very long lines, it wasted time
counting stuff that wouldn't be displayed on the current row anyway,
and reserved *far* too much memory for the displayable string.
Problem existed since commit cf0eed6c from five years ago that traded
a continuous comparison (of the used space with the reserved space)
against a one-time big reservation up front involving a strlen().
In retrospect that was not a good trade-off when softwrapping.
The extra check (charwidth == 0) is incurred only by characters that
have their high bit set, so the average file (with only ASCII) is not
affected by this -- it just loses an unneeded call of strlen().
In UTF-8 valid multibyte characters are at most four bytes long,
and now that we no longer make use of mblen() and mbtowc() from
the underlying system, we won't get five- or six-byte sequences
mistakenly reported as valid (by glibc). So it is always enough
to reserve space for just four bytes per character.
Calling wctomb() with NULL as the first parameter returns zero in a
UTF-8 locale, meaning that there is no state, so there is no point
in resetting it either.
The two calls of draw_row() are each immediately preceded by a call to
display_string(), which has already determined from which x position
and until which x position in the relevant line the current row will
be drawn -- doing this again in draw_row() is a waste of time. Even
though it is ugly, pass the two data points from one function to the
other via global variables.
For normal files (without overlong lines), this saves on average some
fifty calls of advance_over() per row. When softwrapping a file with
overlong lines, the savings for each softwrapped chunk are much higher.
(Well, it now checks that ^G is still the first shortcut that is bound
to 'do_help', but that is good enough: if the user did any rebinding,
they probably do not need any reminder about how to invoke 'Help'.)
This fixes https://savannah.gnu.org/bugs/?60315.
Reported-by: Robert Goulding <goulding.2@nd.edu>
This saves a function call, and the passing and checking of the
MAXCHARLEN parameter, and the checking whether wc is maybe NULL
(which for nano is never the case), and who knows what other
overheads mbtowc() has, and our workaround for glibc.
Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.
Most implementations of mblen() do a call to mbtowc(), which is
a waste of time when all we want to know is the number of bytes
(and when we already know that we're using UTF-8 and that the
first byte is at least 0xC2).
(This also avoids burdening correct implementations with the
workaround that was needed only for glibc.)
Code was written after looking at gnulib/lib/mbrtowc-impl-utf8.h.
The mblen() and mbtowc() functions will happily return 4 or 5 or 6
for byte sequences that start with 0xF4 0x90 or higher. But those
sequences encode for U+110000 or higher, which are not valid Unicode
code points. The libc of FreeBSD and OpenBSD and Alpine correctly
return -1 for such sequences. Make nano behave correctly also when
linked against glibc, so that invalid sequences are always presented
as a series of invalid bytes and never as a single invalid code.
This fixes https://savannah.gnu.org/bugs/?60262.
Bug existed since before version 2.0.0.
The call of this function in make_mbchar() does not add anything,
because wctomb() already returns -1 for codes U+D800 to U+DFFF,
and parse_verbatim_kbinput() already rejects anything that starts
with U+11.... or higher, so make_mbchar() is never called for codes
beyond U+10FFFF.
And the call in display_string() just needs to check for wc <= 0x10FFFF
because mbtowc() already returns -1 for codes U+D800 to U+DFFF.
That is, accept U+FDD0 to U+FDEF, and accept U+xxFFFE and U+xxFFFF
for xx from 00 to 10 hex, being the 66 reserved "non-characters".
It may not be wise of the user to input these "things" (by typing
their code after M-V), but the codes are valid Unicode code points
and should not be rejected.
See https://www.unicode.org/faq/private_use.html#nonchar8 et al.
This fixes https://savannah.gnu.org/bugs/?60263.
Bug existed since before version 2.0.0.