Week 02 (43)

43

UTF-8 Character Encoding (cont)

UTF-8 examples

ch	unicode	bits	simple binary	UTF-8 binary
$	U+0024	7	`010 0100`	`00100100`
¢	U+00A2	11	`000 1010 0010`	`11000010 10100010`
€	U+20AC	16	`0010 0000 1010 1100`	`11100010 10000010 10101100`
𐍈	U+10348	21	`0 0001 0000 0011 0100 1000`	`11110000 10010000 10001101 10001000`

Unicode strings can be manipulated in C (e.g. "")

Like other C strings, they are terminated by a 0 byte (i.e. '\0')

Warning: Functions like strlen may not work as expected.