mc/lib/strutil
devzero 7f9b333861 Ticket #3616: speed up of utf-8 normalization.
When content of a large directory is being sorted by file names, a
significant amount of CPU time is spent in str_utf8_normalize() that is
called from str_utf8_create_key_gen().

For example, /usr/bin/ contains 5437 files on my Archlinux box. Running
mc /usr/bin/ /usr/bin/ takes approx. 75 000 000 CPU instructions to sort
file names, or 25% of total program run time. From these 75 000 000
instructions, 42 500 000 instruction are spent in str_utf8_normalize().

str_utf8_normalize() uses g_utf8_normalize() to do the work.
g_utf8_normalize() is a heavyweight function, that converts UTF-8 into
UCS-4, does the normalization and then converts UCS-4 back into UTF-8.

Since file names are composed of ASCII characters in most cases, we can
speed up str_utf8_normalize() by checking if the heavyweight Unicode
normalization is actually needed. Normalization of ASCII string is
no-op, so it is effectively "normalized" by just strdup().

With this patch, running mc /usr/bin/ /usr/bin/ requires just 37 000 000
instructions to sort the file names (down from 75 000 000) and 4 500 000
instuctions to do str_utf8_normalize() (down from 42 500 000).

Signed-off-by: Andrew Borodin <aborodin@vmail.ru>
2017-07-29 10:23:09 +03:00
..
Makefile.am Add functions to transform string to unsigned integer: 2013-07-05 09:09:03 +04:00
replace.c Update copyright years. 2017-01-22 19:12:55 +03:00
strescape.c Update copyright years. 2017-01-22 19:12:55 +03:00
strutil8bit.c Update copyright years. 2017-01-22 19:12:55 +03:00
strutil.c Update copyright years. 2017-01-22 19:12:55 +03:00
strutilascii.c Update copyright years. 2017-01-22 19:12:55 +03:00
strutilutf8.c Ticket #3616: speed up of utf-8 normalization. 2017-07-29 10:23:09 +03:00
strverscmp.c Update copyright years. 2017-01-22 19:12:55 +03:00
xstrtol.c Update copyright years. 2017-01-22 19:12:55 +03:00