Programming languages have no or basic support of Unicode. Libraries are required to get a full support of Unicode on all platforms.
13.1. Qt library¶
Qt is a big C++ library covering different topics, but it is typically used to create graphical interfaces. It is distributed under the GNU LGPL license (version 2.1), and is also available under a commercial license.
13.1.1. Character and string classes¶
QChar is a Unicode character, only able to store BMP characters. It is implemented using a 16 bits unsigned number. Interesting
isSpace(): True if the character category is separator (Zl, Zp or Zs)
toUpper(): convert to upper case
Qt decodes literal byte strings from ISO 8859-1 using the
QLatin1String class, a thin wrapper to
is a character string storing each character as a single byte. It is possible
because it only supports characters in U+0000—U+00FF range.
cannot be used to manipulate text, it has a smaller API than
example, it is not possible to concatenate two
QTextCodec.codecForLocale() gets the locale encoding codec:
QFile.decodeName() is the reverse operation.
Qt has two implementations of its
- Windows: use Windows native API
- UNIX: use POSIX API. Examples:
13.2. The glib library¶
13.2.1. Character strings¶
gunichar type is a character. It is able to store any Unicode 6.0
13.2.2. Codec functions¶
g_convert(): decode from an encoding and encode to another encoding with the iconv library. Use
g_convert_with_fallback()to choose how to handle undecodable bytes and unencodable characters.
g_locale_to_utf8(): encode to/decode from the current locale encoding.
g_get_charset(): get the locale encoding
- Windows: current ANSI code page
- OS/2: current code page (call
- other: try
g_utf8_get_char(): get the first character of an UTF-8 string as
13.2.3. Filename functions¶
g_filename_to_utf8(): encode/decode a filename to/from UTF-8
g_filename_display_name(): human readable version of a filename. Try to decode the filename from each encoding of
g_get_filename_charsets()encoding list. If all decoding failed, decode the filename from UTF-8 and replace undecodable bytes by � (U+FFFD).
g_get_filename_charsets(): get the list of charsets used to decode and encode filenames.
g_filename_display_name()tries each encoding of this list, other functions just use the first encoding. Use UTF-8 on Windows. On other operating systems, use:
13.3. iconv library¶
By default, libiconv is strict: an unencodable character raise an error. You can ignore these characters
by adding the
//IGNORE suffix to the encoding name. There is also the
suffix to replace unencodable characters by similarly looking
PHP has a builtin binding of iconv.
13.4. ICU libraries¶
International Components for Unicode (ICU) is a mature, widely used set of C, C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is an open source project distributed under the MIT license.