UTF-8 Character Encoding (cont)
UTF-8 examples
ch |
unicode |
bits |
simple binary |
UTF-8 binary |
$ |
U+0024 |
7 |
010 0100 |
00100100 |
¢ |
U+00A2 |
11 |
000 1010 0010 |
11000010 10100010 |
€ |
U+20AC |
16 |
0010 0000 1010 1100 |
11100010 10000010 10101100 |
𐍈 |
U+10348 |
21 |
0 0001 0000 0011 0100 1000 |
11110000 10010000 10001101 10001000 |
Unicode strings can be manipulated in C (e.g. "")
Like other C strings, they are terminated by a 0 byte (i.e. '\0' )
Warning: Functions like strlen may not work as expected.
|