UTF-8 codes in Terminal

Updated: Apr 19, 2009

Last week is fantastic for me, hope you also enjoyed your previous week. Sometimes I realize that, reading man pages and html pages in /usr/share/doc gives us more information, that we never get from google. Last Saturday, I read utf8, Unicode and console_codes man pages one more time to refresh my mind. Also I came up with two scripts which will do some quick works converting tagged unicodes to utf8 codes and it will display it in terminal. These script only works if you have /usr/share/i18n/charmaps/UTF-8.gz file. Here is the scripts.

unicode2utf8.bash

#!/bin/bash

UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"
BUFFER=`cat`
BUFFER=`echo "${BUFFER}" | tr '[a-z]' '[A-Z]'`
BUFFER=`echo "${BUFFER}" | tr '\n' ' '`
BUFFER=`echo "${BUFFER}" | tr '\t' ' '`
BUFFER=`echo "${BUFFER}" | tr -s ' '`
BUFFER=`echo "${BUFFER}" | sed -e 's/>/> /g'`
UTF8BUFFER=""

for UNICODE in ${BUFFER}
do
     UTF8BUFFER="${UTF8BUFFER}"`gunzip -c "${UTF8FILE}" |
     grep "${UNICODE}" |
     awk '{print $2;}'`
done

echo -e "\x1b%G${UTF8BUFFER//\//\\}\x1b%@"

This script will take taged Unicode as standard input and display the resolved glyph in standard output. Here is an example screenshot.

../../_images/unicode2utf8.png

unicodes.bash

#!/bin/bash
UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"

for LANGUAGE
do
     LANGUAGE=`echo "${LANGUAGE}" | tr '[a-z]' '[A-Z]'`

     gunzip -c "${UTF8FILE}" |
     awk "\$3 ~ /${LANGUAGE}/{print \$0;}" |
     (
             while read UNICODE UTF8CODE DESCRIPTION
             do
                     echo -n -e "${UNICODE}\t"
                     echo -n -e "\x1b%G${UTF8CODE//\//\\}\x1b%@"
                     echo -e "\t${DESCRIPTION}"
             done
     )
done

This script will be very interesting to you. If you give a grep pattern matching your language, say tam for tamil, this script will fetch all the unicode details about the whole language. Take a look at the screenshot.

http://mohan43u.files.wordpress.com/2009/04/unicodes.png

I actually intend to write my own algorithm to convert unicode to utf8, but I’m started learning one art called don't invent the wheel again. So I used that file to convert unicode to utf8.

And one more thing, there is a quick way you can type your language characters in console using CTRL+SHIFT+U then giving unicode. For example,

CTRL+SHIFT+U0B85 will display in console.