[prev] 43 [next]

UTF-8 Character Encoding (cont)

UTF-8 examples

ch unicode bits simple binary UTF-8 binary
$ U+0024 7 010 0100 00100100
¢ U+00A2 11 000 1010 0010 11000010 10100010
U+20AC 16 0010 0000 1010 1100 11100010 10000010 10101100
𐍈 U+10348 21 0 0001 0000 0011 0100 1000 11110000 10010000 10001101 10001000

Unicode strings can be manipulated in C (e.g. "")

Like other C strings, they are terminated by a 0 byte (i.e. '\0')

Warning: Functions like strlen may not work as expected.