UTF-8 codes in Terminal¶
Last week is fantastic for me, hope you also enjoyed your previous week. Sometimes I realize that, reading man
pages and html pages in /usr/share/doc
gives us more information, that we never get from google
. Last Saturday, I read utf8
, Unicode
and console_codes
man pages one more time to refresh my mind. Also I came up with two scripts which will do some quick works converting tagged unicodes to utf8
codes and it will display it in terminal. These script only works if you have /usr/share/i18n/charmaps/UTF-8.gz
file. Here is the scripts.
unicode2utf8.bash¶
#!/bin/bash
UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"
BUFFER=`cat`
BUFFER=`echo "${BUFFER}" | tr '[a-z]' '[A-Z]'`
BUFFER=`echo "${BUFFER}" | tr '\n' ' '`
BUFFER=`echo "${BUFFER}" | tr '\t' ' '`
BUFFER=`echo "${BUFFER}" | tr -s ' '`
BUFFER=`echo "${BUFFER}" | sed -e 's/>/> /g'`
UTF8BUFFER=""
for UNICODE in ${BUFFER}
do
UTF8BUFFER="${UTF8BUFFER}"`gunzip -c "${UTF8FILE}" |
grep "${UNICODE}" |
awk '{print $2;}'`
done
echo -e "\x1b%G${UTF8BUFFER//\//\\}\x1b%@"
This script will take taged Unicode as standard input and display the resolved glyph in standard output. Here is an example screenshot.
unicodes.bash¶
#!/bin/bash
UTF8FILE="/usr/share/i18n/charmaps/UTF-8.gz"
for LANGUAGE
do
LANGUAGE=`echo "${LANGUAGE}" | tr '[a-z]' '[A-Z]'`
gunzip -c "${UTF8FILE}" |
awk "\$3 ~ /${LANGUAGE}/{print \$0;}" |
(
while read UNICODE UTF8CODE DESCRIPTION
do
echo -n -e "${UNICODE}\t"
echo -n -e "\x1b%G${UTF8CODE//\//\\}\x1b%@"
echo -e "\t${DESCRIPTION}"
done
)
done
This script will be very interesting to you. If you give a grep pattern matching your language, say tam
for tamil, this script will fetch all the unicode details about the whole language. Take a look at the screenshot.
I actually intend to write my own algorithm to convert unicode to utf8, but I’m started learning one art called don't invent the wheel again
. So I used that file to convert unicode to utf8.
And one more thing, there is a quick way you can type your language characters in console using CTRL+SHIFT+U
then giving unicode. For example,
CTRL+SHIFT+U0B85
will display அ
in console.