Sorting Tamil Words: Issues and Solutions

There are approximately 65 million Tamils in India, and 80 million worldwide. It has been recognized as Classical Language. Collation is one of the most important features of a script. It determines the order in which a given culture indexes its characters. Unicode has become a world standard and many computer applications have provided Unicode support so that multilingual text can be handled. . It encodes glyphs which have no sound and are not characters in Tamil. The Unicode standard proposed for Tamil has not taken into consideration some of the important linguistic issues. This leads to problems in Language processing in Tamil, especially the sorting which is the basic operation for database applications and the like. Unicode does not treat uyirmei characters as separate characters. So for one uyir-mei characters there may be two or more Unicode characters are to be combined. Sorting in Tamil is not character sorting as that of English. In order to do sorting, this disparity has to set right then only we can do comparisons. In this paper we have shown the detailed picture on encoding of tamil characters, their problems, an algorithm for sorting based on current Unicode encoding, and the recommendations of Tamilnadu government TACE- 16. As of now, we need to do sorting in Two phases The recommendations of TACE-16 has shown best for Tamil character encodings.


