Sorting Tamil Words: Issues and Solutions

International Journal of Computer Science and Engineering
© 2016 by SSRG - IJCSE Journal
Volume 3 Issue 2
Year of Publication : 2016
Authors : K.Ponmozhi

pdf
How to Cite?

K.Ponmozhi, "Sorting Tamil Words: Issues and Solutions," SSRG International Journal of Computer Science and Engineering , vol. 3,  no. 2, pp. 1-6, 2016. Crossref, https://doi.org/10.14445/23488387/IJCSE-V3I2P101

Abstract:

There are approximately 65 million Tamils in India, and 80 million worldwide. It has been recognized as Classical Language. Collation is one of the most important features of a script. It determines the order in which a given culture indexes its characters. Unicode has become a world standard and many computer applications have provided Unicode support so that multilingual text can be handled. . It encodes glyphs which have no sound and are not characters in Tamil. The Unicode standard proposed for Tamil has not taken into consideration some of the important linguistic issues. This leads to problems in Language processing in Tamil, especially the sorting which is the basic operation for database applications and the like. Unicode does not treat uyirmei characters as separate characters. So for one uyir-mei characters there may be two or more Unicode characters are to be combined. Sorting in Tamil is not character sorting as that of English. In order to do sorting, this disparity has to set right then only we can do comparisons. In this paper we have shown the detailed picture on encoding of tamil characters, their problems, an algorithm for sorting based on current Unicode encoding, and the recommendations of Tamilnadu government TACE- 16. As of now, we need to do sorting in Two phases The recommendations of TACE-16 has shown best for Tamil character encodings.

Keywords:

Tamil Computing, sorting, collation.

References:

[1] http://www.unicode.org
[2] The second Tamil Internet Conference held in Febrary 1999.
[3] http://srinix.wordpress.com/2007/08/29/tutorial-how-tostore- utf8-indian-language-data-in-mysql/
[4] http://www.hotfrog.in/companies/C-DAC-Gist-PACEMulitlingual- Computer-Training
[5] Script grammar for tamil language developed by C-DAC
[6] C. Sureshkumar and T. Ravichandran, Handwritten Tamil Character Recognition and Conversiom using Neural Network, International Journal on Computer Science and Engineering, vol. 02, No. 07, 2010, 2261-2267.
[7] Tamil Style Guidedownload.microsoft.com/download/5/0./tam-tam- StyleGuide.pdf
[8] K. Rajan, M.Ganesan and V.Ramalingam, Tamil Text Analyse, Tamil Net 2003, Chnnai, Tamilnadu, India, pp: 38-44.