Improve Unicode / UTF-8 documentation
This commit is contained in:
parent
f3724f7488
commit
30a868dc0f
@ -2,12 +2,12 @@
|
|||||||
|
|
||||||
\page unicode Unicode and UTF-8 Support
|
\page unicode Unicode and UTF-8 Support
|
||||||
|
|
||||||
This chapter explains how FLTK handles international
|
This chapter explains how FLTK handles international
|
||||||
text via Unicode and UTF-8.
|
text via Unicode and UTF-8.
|
||||||
|
|
||||||
Unicode support was only recently added to FLTK and is
|
Unicode support was added to FLTK starting with version 1.3.0 and is
|
||||||
still incomplete. This chapter is Work in Progress, reflecting
|
still incomplete but mostly functional. This chapter is Work in Progress,
|
||||||
the current state of Unicode support.
|
reflecting the current state of Unicode support.
|
||||||
|
|
||||||
\section unicode_about About Unicode, ISO 10646 and UTF-8
|
\section unicode_about About Unicode, ISO 10646 and UTF-8
|
||||||
|
|
||||||
@ -16,11 +16,11 @@ deliberately brief and provides just enough information for
|
|||||||
the rest of this chapter.
|
the rest of this chapter.
|
||||||
|
|
||||||
For further information, please see:
|
For further information, please see:
|
||||||
- http://www.unicode.org
|
- https://unicode.org
|
||||||
- http://www.iso.org
|
- https://iso.org
|
||||||
- http://en.wikipedia.org/wiki/Unicode
|
- https://en.wikipedia.org/wiki/Unicode
|
||||||
- http://www.cl.cam.ac.uk/~mgk25/unicode.html
|
- https://www.cl.cam.ac.uk/~mgk25/unicode.html
|
||||||
- http://www.apps.ietf.org/rfc/rfc3629.html
|
- https://tools.ietf.org/html/rfc3629
|
||||||
|
|
||||||
|
|
||||||
\par The Unicode Standard
|
\par The Unicode Standard
|
||||||
@ -33,7 +33,7 @@ and is supported by most of the major computing companies in the world.
|
|||||||
Before Unicode, many different systems, on different platforms,
|
Before Unicode, many different systems, on different platforms,
|
||||||
had been developed for encoding characters for different languages,
|
had been developed for encoding characters for different languages,
|
||||||
but no single encoding could satisfy all languages.
|
but no single encoding could satisfy all languages.
|
||||||
Unicode provides access to over 100,000 characters
|
Unicode provides access to over 130,000 characters
|
||||||
used in all the major languages written today,
|
used in all the major languages written today,
|
||||||
and is independent of platform and language.
|
and is independent of platform and language.
|
||||||
|
|
||||||
@ -78,7 +78,10 @@ U+10FFFF. The complete character set is sub-divided into \e planes.
|
|||||||
used characters from previous encoding standards. Other planes
|
used characters from previous encoding standards. Other planes
|
||||||
contain characters for specialist applications.
|
contain characters for specialist applications.
|
||||||
|
|
||||||
\todo Do we need this info about planes?
|
\todo FLTK 1.3 and later supports the full Unicode range (21 bits), but
|
||||||
|
there are a few exceptions, for instance binary shortcut values in menus
|
||||||
|
(\ref Fl_Shortcut) can only be used with characters from the BMP (16 bits).
|
||||||
|
This may be extended in a future FLTK version.
|
||||||
|
|
||||||
The UCS also defines various methods of encoding characters as
|
The UCS also defines various methods of encoding characters as
|
||||||
a sequence of bytes.
|
a sequence of bytes.
|
||||||
@ -95,8 +98,8 @@ UTF-16 and UTF-32 are based on units of two and four bytes.
|
|||||||
UCS characters requiring more than 16 bits are encoded using
|
UCS characters requiring more than 16 bits are encoded using
|
||||||
"surrogate pairs" in UTF-16.
|
"surrogate pairs" in UTF-16.
|
||||||
|
|
||||||
UTF-8 encodes all Unicode characters into variable length
|
UTF-8 encodes all Unicode characters into variable length
|
||||||
sequences of bytes. Unicode characters in the 7-bit ASCII
|
sequences of bytes. Unicode characters in the 7-bit ASCII
|
||||||
range map to the same value and are represented as a single byte,
|
range map to the same value and are represented as a single byte,
|
||||||
making the transformation to Unicode quick and easy.
|
making the transformation to Unicode quick and easy.
|
||||||
|
|
||||||
@ -139,6 +142,11 @@ some level of synchronisation and error detection.
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
\note This table contains theoretical values outside the valid Unicode
|
||||||
|
range (<tt>U+000000 - U+10FFFF</tt>). Such values can only be returned by
|
||||||
|
conversion functions for illegal input values (see \ref unicode_illegals).
|
||||||
|
|
||||||
|
|
||||||
\par
|
\par
|
||||||
|
|
||||||
Moving from ASCII encoding to Unicode will allow all new FLTK
|
Moving from ASCII encoding to Unicode will allow all new FLTK
|
||||||
@ -175,7 +183,7 @@ the following limitations:
|
|||||||
are LIMITED to 24 bit Unicode values, but also says that only 16 bits
|
are LIMITED to 24 bit Unicode values, but also says that only 16 bits
|
||||||
are really used under linux and win32.
|
are really used under linux and win32.
|
||||||
<b>[Can we verify this?]</b>
|
<b>[Can we verify this?]</b>
|
||||||
|
|
||||||
- The [<b>fltk2</b>] %fl_utf8encode() and %fl_utf8decode() functions are
|
- The [<b>fltk2</b>] %fl_utf8encode() and %fl_utf8decode() functions are
|
||||||
designed to handle Unicode characters in the range U+000000 to U+10FFFF
|
designed to handle Unicode characters in the range U+000000 to U+10FFFF
|
||||||
inclusive, which covers all UTF-16 characters, as specified in RFC 3629.
|
inclusive, which covers all UTF-16 characters, as specified in RFC 3629.
|
||||||
@ -189,7 +197,7 @@ the following limitations:
|
|||||||
and not on a general Unicode character basis.
|
and not on a general Unicode character basis.
|
||||||
|
|
||||||
- FLTK will not handle right-to-left or bi-directional text.
|
- FLTK will not handle right-to-left or bi-directional text.
|
||||||
|
|
||||||
\todo
|
\todo
|
||||||
Verify 16/24 bit Unicode limit for different character sets?
|
Verify 16/24 bit Unicode limit for different character sets?
|
||||||
OksiD's code appears limited to 16-bit whereas the FLTK2 code
|
OksiD's code appears limited to 16-bit whereas the FLTK2 code
|
||||||
@ -249,7 +257,7 @@ about error handling and return values.
|
|||||||
|
|
||||||
\section unicode_fltk_calls FLTK Unicode and UTF-8 Functions
|
\section unicode_fltk_calls FLTK Unicode and UTF-8 Functions
|
||||||
|
|
||||||
This section currently provides a brief overview of the functions.
|
This section provides a brief overview of the functions.
|
||||||
For more details, consult the main text for each function via its link.
|
For more details, consult the main text for each function via its link.
|
||||||
|
|
||||||
int fl_utf8locale()
|
int fl_utf8locale()
|
||||||
|
Loading…
Reference in New Issue
Block a user