Decode-process-encode

Our previous examples had Unicode symbols in source code itself. When dealing with real world application this is not usually the case. Most of the time you'll perform processing of some kind of data that came from an external source, would that be a database, World-Wide Web of something else.

Outside of your program data exists in form of bytes, and a set of rules which one would use to convert writing symbols into sequence of bytes is called encoding. Encode module is the tool for doing encoding convertions in Perl.

So very typical workflow for some script would be following:

  • Decode you input data using Encode module.
  • Do some some stuff with your textual data.
  • Encode your text into suitable encoding and pass it outside.

Note that last step can include printing into some filehandle, in which case you can, for example, use binmode function as we did before, instead of Encode::encode function.

Also, you don't have to always perform steps 1 and 3 by yourself. In case you are using some encoding-aware module to fetch or parse data, decoding/encoding steps can be automaticaly taken for you by that module (e.g. JSON, DBI).

Main point here is that you should always be aware of what state your data is in and carefully read the docs for modules you use. But for simple string juggling three steps above should be enough to get you going.