Byte vs. character

There is one common misconception about encodings: "1 character takes 1 byte". This is obviously true for single-byte encodings, such as latin1.

use Encode qw(encode);

say length encode('latin1', '$'); # says 1

But since Unicode now defines more then 1 million code points (1,114,112 to be precise) it's absolutely impossible to use one byte (which can take only 256 combinations of bits) to hold them all. That's where UTF encodings step in. And UTF-8 is one of the most interesting of them all. See, depending on code point UTF-8 encoded Unicode symbol can take from 1 byte

use Encode qw(encode);

say length encode('utf8', '$'); # says 1

up to 6 bytes for a single code point! Once again, see the difference between symbol and byte sequence representing that symbol:

use utf8;

use Encode qw(encode);

say length '€';                 # says 1

say length encode('utf8', '€'); # says 3