Kindness City Blog
18 Dec 2025

What I did to make unicode work...

The problem

I got an email that was sent from a french email client. When the sender quoted some text, it looked like this:

Le jeu. 18 déc. 2025, 15:48, This Person <person@place> a écrit :
> The thing they quoted

…only what it looked like for me was:

Le jeu. 18 d?c. 2025, 15:48, This Person <person@place> a ?crit :
> The thing they quoted

That sucks. So I fixed it.

The tech-stack

Getting unicode right in a terminal program can involve a whole stack of programs. In my case:

  • I'm reading my email in mutt
  • Running in ansi-term
  • …which is a terminal emulator in emacs 1
  • I'm connecting to my emacs with emacsclient on my server
  • I'm running that client in a ksh shell on my server
  • I'm connecting to the server using ssh
  • …which I'm running in an xterm on my laptop.

The elements of this stack I care about are:

  • Slackware's base locale
  • Configuring my xterm
  • Configuring my bsd locale

Slackware's base locale

If you run locale, you'll see a bunch of stuff about your unix's localisation. If the string "UTF" doesn't appear in there, then you're probably not going to be taking advantage of any UTF8 (unicode) features in your terminal programs.

Fortunately, slackware has the following in /etc/profile.d/lang.sh :

export LANG=en_US.UTF-8

…which means I can see this:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ 

Configuring my xterm

I need to tell xterm to turn on its UTF-8 features, and I need to tell it to use a font that can actually display the characters it tries to display.

I save my xterm configs in ~/.Xresources. As is common, I have a line in my ~/.xinitrc which runs xrdb -merge ~/$HOME/.Xresources.

I think the trickiest bit here is actually picking a font that looks good and actually has all the characters I want. I chose Hack, which comes with a default Slackware install.

! Hack seems to be the best slackware font for unicode                                                                                                                                     
XTerm.vt100.faceName: Hack
XTerm.vt100.faceSize: 12

! Ensure unicode                                                                                                                                                                           
XTerm.vt100.locale: false
XTerm.vt100.utf8: true

I checked that I could display all the characters I wanted by running emacs -nw and hitting <F1> h (or M-x view-hello-file). This displays a handy file that contains a "hello" greeting in a bewildering array of languages. And hence, also, character-sets and alphabets.

Incidentally…

While I'm talking about configuring xterm, I might as well also note these configs, which have nothing to do with unicode:

! Select with mouse to secondary, not primary                                                                                                                                              
XTerm.vt100.selectToClipboard: true

! Make the alt key work like I expect.
XTerm.vt100.metaSendsEscape: true

! tmux can copy to the clipboard if we allow this                                                                                                                                          
XTerm*disallowedWindowOps: 20,21,SetXprop

Sending our mouse-selection to secondary means I never have to worry about finding a "copy-to-clipboard" shortcut that doesn't clash with whatever program I'm running in the terminal. Whatever I highlight is copied to the proper clipboard. I can paste it in firefox with C-v , in emacs with C-y and in another xterm with S-<insert> just as you'd expect.

Making "meta send escape" means I can do things like navigate word-by-word with M-f and M-b and so on.

And the disallowedWindowOps thing gives tmux permission to directly access my clipboard by using some kind of magic terminal escape characters.

Configuring my bsd locale

OpenBSD defaults to the C or POSIX locale (see the handbook here). This is a perfectly reasonable default, but for the user I want to use to read my email it's not what I want.

So now my ~/.profile contains:

LANG="en_GB.UTF-8"
export LANG

Now my locale looks like this:

$ locale
LANG=en_GB.UTF-8
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
$ 

How it all fits together

So now (if I understand this right), when my email program wants to show me a french character like "é", I think something like this happens:

  • mutt checks one or more of my locale environment variables to see if it should even attempt this UTF-8 business. It sees en_GB.UTF-8 and decides to try to print the actual unicode character.
  • mutt knows it wants to send a UTF-8 character to the terminal it's running in – but x terminals have been around a lot longer than UTF-8 has, and they all handle it slightly differently. What's the right way to talk to this one?
  • mutt checks the $TERM environment variable and sees that it's running in something called eterm-color. It has no idea what this is, but it can look it up.
  • Yep, there's a terminfo entry in /usr/share/terminfo/e/eterm-color – that tells mutt how to tell the terminal how to handle non-ASCII characters like "é".
  • The terminal is emacs' ansi-term, and it receives the character correctly. Now it has to figure out how to pass it on.
  • Emacs checks the locale, and sees that we're doing the unicode thing.
  • Emacs checks $TERM and sees that it's running in an xterm. There's a terminfo entry in /usr/share/terminfo/x/xterm that tells emacs how to send me my "é".
  • ssh receives the "é" over the network from emacs. This is the first program in the stack that's running on my local laptop, so here everything could be different.
  • ssh checks my locale. It's en_US.UTF-8. Not exactly the same as the server, but close enough. At least they both agree on UTF-8.
  • ssh checks my $TERM. Yup, it's xterm. Slackware also keeps its terminfo files in /usr/share, so the appropriate file is still /usr/share/terminfo/x/xterm. Just on a different computer from last time.
  • xterm receives our character correctly from ssh. xterm is configured with vt100.utf-8 set to true, so it's definitely going to try to send me the actual character.
  • xterm tells X to display a 12 point character in the Hack font at code point "é".
  • Since the Hack font actually knows how to draw an "e" with an accent above it, I finally get to see the character my friend (or their mailer) typed in their email \o/

Phew.

Footnotes:

1

This is probably less silly than you think. When working remotely it's always a good idea to use something to persist your session in case your connection drops. I used to use screen for this, and tmux is very popular. But it turns out that running emacs in demon-mode and connecting with emacsclients does the job very well indeed.

Tags: sysadmin unix linux unicode emacs bsd

There's no comments mechanism in this blog (yet?), but I welcome emails and fedi posts. If you choose to email me, you'll have to remove the .com from the end of my email address by hand.

You can also follow this blog with RSS.

Other posts