When content of a large directory is being sorted by file names, a
significant amount of CPU time is spent in str_utf8_normalize() that is
called from str_utf8_create_key_gen().
For example, /usr/bin/ contains 5437 files on my Archlinux box. Running
mc /usr/bin/ /usr/bin/ takes approx. 75 000 000 CPU instructions to sort
file names, or 25% of total program run time. From these 75 000 000
instructions, 42 500 000 instruction are spent in str_utf8_normalize().
str_utf8_normalize() uses g_utf8_normalize() to do the work.
g_utf8_normalize() is a heavyweight function, that converts UTF-8 into
UCS-4, does the normalization and then converts UCS-4 back into UTF-8.
Since file names are composed of ASCII characters in most cases, we can
speed up str_utf8_normalize() by checking if the heavyweight Unicode
normalization is actually needed. Normalization of ASCII string is
no-op, so it is effectively "normalized" by just strdup().
With this patch, running mc /usr/bin/ /usr/bin/ requires just 37 000 000
instructions to sort the file names (down from 75 000 000) and 4 500 000
instuctions to do str_utf8_normalize() (down from 42 500 000).
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
lib/strutil/xstrtol.c: prohibit monstrosities like "1bB".
Problem reported by Young Mo Kang in: http://bugs.gnu.org/23388.
(xstrtoumax): Allow trailing second suffixes like "B" only if the first
suffix needs a base.
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
Introduce -Wswitch-default check.
Some minor cosmetics.
Thanks Andreas Mohr <and at gmx dot li> for original patch.
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
...to avoid conflict with global names.
On HP-UX, inttypes.h includes ctype.h through other dependencies, ctype.h
defines macros for various functions and these macros clash with entries
of "struct str_class".
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
Set defines via CPPFLAGS variable not via CFLAGS one.
Use AM_CPPFLAGS and AM_CFLAGS variables instead of per-target ones.
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
On Mac OS X, in the iTerm2, when the LANG variable is set to en_US.utf-8
mcedit specifically does not correctly accept input (every character press
is interpreted as a '.'). However when LANG is set to en_US.UTF-8 mcedit
works correctly (see also http://code.google.com/p/iterm2/issues/detail?id=204).
On Linux, nl_langinfo(CODESET) returns upper case UTF-8 whether the LANG is set
to utf-8 or UTF-8.
On Mac OS X, it returns the same case as the LANG input.
So let tranform result of nl_langinfo(CODESET) to upper case unconditionally.
Signed-off-by: Andrew Borodin <aborodin@vmail.ru>