[prev] 39 [next]

Character Data

Character data has several possible representations (encodings)

The two most common:

  • ASCII (ISO 646)
    • 7-bit values, using lower 7-bits of a byte (top bit always zero)
    • can encode roman alphabet, digits, punctuation, control chars
  • UTF-8 (Unicode)
    • 8-bit values, with ability to extend to multi-byte values
    • can encode all human languages plus other symbols

      (e.g.  √   ∑   ∀   ∃   or )