Nicolas Martyanoff — Brain dump About

Printing non-ascii characters in Erlang releases

I have been learning a lot about Erlang recently, and I stumbled upon a strange behaviour: printing strings containing non-ascii characters, e.g. "μs", using io:format and the ~ts control sequence behaves differently depending on the way the program is run:

  • In rebar3 shell, rebar3 run, or a release running with a console: "μs".
  • With a release running in foreground: "\x{3BC}s".

I first checked the shell environment, but os:getenv("LANG") correctly returns en_US.UTF-8 in both cases. I then suspected the Erlang VM was initialized with a different +pc flag in console mode, but adding +pc unicode to vm.args did not change anything.

Finally I ended up on the very detailed IO protocol documentation. Printing the IO configuration of the standard output, obtained with io:getopts(), yields different results depending on the environment: in interactive environments and with rebar3 run, the encoding parameter is set to unicode, but is set to latin1 in releases running in foreground.

As it turns out, configuring the standard IO device to output UTF-8 encoded strings is a simple as calling io:setopts([{encoding, unicode}]) when the application starts. This is definitely something I will add to all my applications from now on.

Share the word!

Liked my article? Follow me on Twitter or on Mastodon to see what I'm up to.