What I did to make unicode work...
The problem
I got an email that was sent from a french email client. When the sender quoted some text, it looked like this:
Le jeu. 18 déc. 2025, 15:48, This Person <person@place> a écrit : > The thing they quoted
…only what it looked like for me was:
Le jeu. 18 d?c. 2025, 15:48, This Person <person@place> a ?crit : > The thing they quoted
That sucks. So I fixed it.
The tech-stack
Getting unicode right in a terminal program can involve a whole stack of programs. In my case:
- I'm reading my email in
mutt - Running in
ansi-term - …which is a terminal emulator in
emacs1 - I'm connecting to my emacs with
emacsclienton my server - I'm running that client in a
kshshell on my server - I'm connecting to the server using
ssh - …which I'm running in an
xtermon my laptop.
The elements of this stack I care about are:
- Slackware's base locale
- Configuring my
xterm - Configuring my bsd locale
Slackware's base locale
If you run locale, you'll see a bunch of stuff about your unix's
localisation. If the string "UTF" doesn't appear in there, then you're
probably not going to be taking advantage of any UTF8 (unicode)
features in your terminal programs.
Fortunately, slackware has the following in /etc/profile.d/lang.sh :
export LANG=en_US.UTF-8
…which means I can see this:
$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE=C LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= $
Configuring my xterm
I need to tell xterm to turn on its UTF-8 features, and I need to tell it to use a font that can actually display the characters it tries to display.
I save my xterm configs in ~/.Xresources. As is common, I have a
line in my ~/.xinitrc which runs xrdb -merge ~/$HOME/.Xresources.
I think the trickiest bit here is actually picking a font that looks
good and actually has all the characters I want. I chose Hack, which
comes with a default Slackware install.
! Hack seems to be the best slackware font for unicode XTerm.vt100.faceName: Hack XTerm.vt100.faceSize: 12 ! Ensure unicode XTerm.vt100.locale: false XTerm.vt100.utf8: true
I checked that I could display all the characters I wanted by running
emacs -nw and hitting <F1> h (or M-x view-hello-file). This
displays a handy file that contains a "hello" greeting in a
bewildering array of languages. And hence, also, character-sets and
alphabets.
Incidentally…
While I'm talking about configuring xterm, I might as well also note
these configs, which have nothing to do with unicode:
! Select with mouse to secondary, not primary XTerm.vt100.selectToClipboard: true ! Make the alt key work like I expect. XTerm.vt100.metaSendsEscape: true ! tmux can copy to the clipboard if we allow this XTerm*disallowedWindowOps: 20,21,SetXprop
Sending our mouse-selection to secondary means I never have to worry
about finding a "copy-to-clipboard" shortcut that doesn't clash with
whatever program I'm running in the terminal. Whatever I highlight is
copied to the proper clipboard. I can paste it in firefox with C-v ,
in emacs with C-y and in another xterm with S-<insert> just as
you'd expect.
Making "meta send escape" means I can do things like navigate
word-by-word with M-f and M-b and so on.
And the disallowedWindowOps thing gives tmux permission to directly
access my clipboard by using some kind of magic terminal escape
characters.
Configuring my bsd locale
OpenBSD defaults to the C or POSIX locale (see the handbook
here). This is a perfectly reasonable default, but for the user I want
to use to read my email it's not what I want.
So now my ~/.profile contains:
LANG="en_GB.UTF-8" export LANG
Now my locale looks like this:
$ locale LANG=en_GB.UTF-8 LC_COLLATE="en_GB.UTF-8" LC_CTYPE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_ALL= $
How it all fits together
So now (if I understand this right), when my email program wants to show me a french character like "é", I think something like this happens:
muttchecks one or more of mylocaleenvironment variables to see if it should even attempt this UTF-8 business. It seesen_GB.UTF-8and decides to try to print the actual unicode character.muttknows it wants to send a UTF-8 character to the terminal it's running in – but x terminals have been around a lot longer than UTF-8 has, and they all handle it slightly differently. What's the right way to talk to this one?muttchecks the$TERMenvironment variable and sees that it's running in something calledeterm-color. It has no idea what this is, but it can look it up.- Yep, there's a terminfo entry in
/usr/share/terminfo/e/eterm-color– that tells mutt how to tell the terminal how to handle non-ASCII characters like "é". - The terminal is emacs'
ansi-term, and it receives the character correctly. Now it has to figure out how to pass it on. - Emacs checks the
locale, and sees that we're doing the unicode thing. - Emacs checks
$TERMand sees that it's running in anxterm. There's a terminfo entry in/usr/share/terminfo/x/xtermthat tells emacs how to send me my "é". sshreceives the "é" over the network from emacs. This is the first program in the stack that's running on my local laptop, so here everything could be different.sshchecks mylocale. It'sen_US.UTF-8. Not exactly the same as the server, but close enough. At least they both agree on UTF-8.sshchecks my$TERM. Yup, it'sxterm. Slackware also keeps its terminfo files in/usr/share, so the appropriate file is still/usr/share/terminfo/x/xterm. Just on a different computer from last time.xtermreceives our character correctly fromssh.xtermis configured withvt100.utf-8set to true, so it's definitely going to try to send me the actual character.xtermtells X to display a 12 point character in theHackfont at code point "é".- Since the
Hackfont actually knows how to draw an "e" with an accent above it, I finally get to see the character my friend (or their mailer) typed in their email \o/
Phew.
Footnotes:
This is probably
less silly than you think. When working remotely it's always a good
idea to use something to persist your session in case your
connection drops. I used to use screen for this, and tmux is
very popular. But it turns out that running emacs in demon-mode and
connecting with emacsclients does the job very well indeed.
There's no comments mechanism in this blog (yet?), but I welcome emails and fedi posts. If you choose to email me, you'll have to remove the .com from the end of my email address by hand.
You can also follow this blog with RSS.