Call For Paper - Upcoming Conferences

Research Article | Open Access | Download PDF
Volume 13 | Issue 5 | Year 2026 | Article Id. IJEEE-V13I5P114 | DOI : https://doi.org/10.14445/23488379/IJEEE-V13I5P114

Advanced Signal Processing and Deep Learning-Based Speech Emotion Recognition in the Bodo Language


Rupali Khaklary, Nabankur Pathak

Received Revised Accepted Published
22 Feb 2026 21 Mar 2026 21 Apr 2026 30 May 2026

Citation :

Rupali Khaklary, Nabankur Pathak, "Advanced Signal Processing and Deep Learning-Based Speech Emotion Recognition in the Bodo Language," International Journal of Electrical and Electronics Engineering, vol. 13, no. 5, pp. 172-182, 2026. Crossref, https://doi.org/10.14445/23488379/IJEEE-V13I5P114

Abstract

In the current generation of communication systems, it is essential for accurate emotion recognition across linguistically diverse settings. Extensive research has addressed emotion recognition from speech (SER) in various languages, whereas investigations involving the Bodo language are still scarce. This work addresses a Bodo-specific SER framework, combining signal processing techniques with deep learning models. The study introduces an innovative audio data collection method, specifically tailored for Bodo emotional speech, which has not been previously explored. To represent the acoustic characteristics of the Bodo speech signals, MFCC, Mel-spectrogram, Chroma, Zero Crossing Rate, and Root Mean Square Energy are extracted and organized as input features for training the proposed model. Extraction is performed under two conditions using original data and augmented samples for comparative evaluation. The resulting feature sets train the proposed Convolutional Neural Network model (CNN), optimized through hyperparameter tuning. Performance is compared between augmented and non-augmented datasets. The proposed CNN-based model, combined with augmented data, demonstrates higher accuracy (81.71%) and robustness in emotion recognition. This work also provides a novel analysis of the unique spectral and prosodic characteristics of Bodo speech, offering fresh insights into its acoustic properties. The proposed approach achieves higher accuracy in Bodo speech emotion recognition and contributes to further research in this area.

Keywords

Bodo Language, Data augmentation, Deep Learning, Signal Processing, Speech Emotion Recognition.

References

  1. Babak Joze Abbaschian, Daniel Sierra-Sosa, and Adel Elmaghraby, “Deep Learning Techniques for Speech Emotion Recognition: From Databases to Model,” Sensors, vol. 21, no. 4, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  2. Mai El Seknedy, and Sahar Fawzi, “Speech Emotion Recognition System for Human Interaction Applications,” 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, pp. 361-368, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  3. Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray, “Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases,” Pattern Recognition, vol. 44, no. 3, pp. 572-587, 2011.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  4. D.J. France et al., “Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk,” IEEE Transactions on Biomedical Engineering, vol. 47, no. 7, pp. 829-837, 2000.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  5. S. Maheshwari, R. Bhuvana, and S. Sasikala, “Emotion Recognition Using Deep Learning,” International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), vol. 3, no. 1, pp. 16-22, 2023.
    [
    Publisher Link]
  6. Rizwan Ullah et al., “Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer,” Sensors, vol. 23, no. 13, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  7. Gang Liu, Shifang Cai, and Ce Wang, “Speech Emotion Recognition Based on Emotion Perception,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, pp. 1-7, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  8. Palak Kaushik, and Ashish Sharma, “Analysing Paralinguistic Information from Human Speech and its Applications in Medicine,” 2023 International Conference on Advances in Electronics, Communication, Computing and Intelligent Information Systems (ICAECIS), Bangalore, India, pp. 55-59, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  9. Rashmi Rani, and Manoj Kumar Ramaiya, “Enhancing Speech Emotion Recognition with Multi-Modal Hybrid Features and CNN,” SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 7, pp. 35-46, 2025.
    [
    CrossRef] [Publisher Link]
  10. Izza Nur Afifah, Tri Budi Santoso, and Titon Dutono, “Indonesian Speech Emotion Recognition: Feature Extraction and Neural Network Approaches,” International Journal of Electrical and Computer Engineering (IJECE), vol. 15, no. 4, pp. 3769-3778, 2025.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  11. Vidhi Sareen, and Seeja K.R., “Speech Emotion Recognition Using Mel Spectrogram and Convolutional Neural Networks,” Procedia Computer Science, vol. 258, pp. 3693-3702, 2025.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  12. Aditya Bihar Kandali, Aurobinda Routray, and Tapan Kumar Basu, “Vocal Emotion Recognition in Five Native Languages of Assam Using New Wavelet Features,” International Journal of Speech Technology, vol. 12, pp. 1-13, 2009.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  13. Uzzal Sharma, “Identification of Emotion from Speech Signal,” 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, pp. 2805-2807, 2016.
    [
    Google Scholar] [Publisher Link]
  14. Laba Kr. Thakuria et al., “Integrating Rule and Template-Based Approaches to Prosody Generation for Emotional BODO Speech Synthesis,” 2014 Fourth International Conference on Communication Systems and Network Technologies, Bhopal, India, pp. 939-943, 2014.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  15. Kalita Barnali, “Bodo Emotional Speech Synthesis and Recognition Using HMM,” Ph.D. Thesis, Gauhati University, 2018.
    [
    Publisher Link]
  16. Shashidhar G. Koolagudi, and K. Sreenivasa Rao, “Emotion Recognition from Speech: A Review,” International Journal of Speech Technology, vol. 15, no. 2, pp. 99-117, 2012.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  17. Masaki Kurematsu, Jun Hakura, and Hamido Fujita, “An Extraction of Emotion in Human Speech Using Speech Synthesis and Classifiers for Each Emotion,” WSEAS Transactions on Information Science and Applications, vol. 5, no. 3, pp. 246-251, 2008.
    [
    Google Scholar] [Publisher Link]
  18. Akalpita Das, Purnendu Acharjee, and Pranhari Talukdar, “An Improved Approach of Emotion Recognition Combining Spectral and Prosodic Features with Reference to Assamese Language,” International Journal of Innovative Research and Advanced Studies, vol. 4, no, 4, pp. 111-114, 2017.
    [
    Publisher Link]
  19. Kishor Bhangale, and Mohanaprasad Kothandaraman, “Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network,” Electronics, vol. 12, no. 4, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  20. Kudakwashe Zvarevashe, and Oludayo Olugbara, “Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition,” Algorithms, vol. 13, no. 3, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  21. Lamiaa Abdel-Hamid, Nabil H. Shaker, and Ingy Emara, “Analysis of Linguistic and Prosodic Features of Bilingual Arabic-English Speakers for Speech Emotion Recognition,” IEEE Access, vol. 8, pp. 72957-72970, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  22. Yu Zhou et al., “Speech Emotion Recognition Using Both Spectral and Prosodic Features,” 2009 International Conference on Information Engineering and Computer Science, Wuhan, China, pp. 1-4, 2009.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  23. I. Manolekshmi, and M. A. Mukunthan, “Speech Emotion Recognition Using Hybrid Deep Learning and Ensemble Approaches,” SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 1, pp. 216-235, 2025.
    [
    CrossRef] [Publisher Link]
  24. Uzzal Sharma, “A Study on Intonation and Prosody of Bodo Language,” Ph.D. Thesis, Department of Instrumentation & USIC, Gauhati University, Assam, India, 2012.
    [Publisher Link]
  25. Satyendranarayan N. Goswami, Studies in Sino-Tibetan Language, Assam, India: Mandira Goswami, 1988. [Online]. Available: https://search.worldcat.org/title/Studies-in-Sino-Tibetan-languages/oclc/246649500
  26. A. Brahma, Modern Bodo Grammar, 1st ed., vol. 1, no.1, Guwahati, India: N. L. Publications, 2012.
    [Google Scholar]
  27. Sanjib Narzary et al., “Generating Monolingual Dataset for Low Resource Language Bodo from Old Books Using Google Keep,” Proceedings of the 13th Conference on Language Resources and Evaluation, Marseille, France, pp. 6563-6570, 2022.
    [
    Google Scholar] [Publisher Link]
  28. Anusha Koduru, Hima Bindu Valiveti, and Anil Kumar Budati, “Feature Extraction Algorithm to Improve the Speech Emotion Recognition Rate,” International Journal of Speech Technology, vol. 23, no. 1, pp. 45-55, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  29. Babak Basharirad, and Mohammadreza Moradhaseli, “Speech Emotion Recognition Methods: A Literature Review,” AIP Conference Proceedings, vol. 1891, no. 1, 2017.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  30. Samson Akinpelu, and Serestina Viriri, “Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning,” Applied Sciences, vol. 12, no. 16, 2022.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  31. Shadi Langari, Hossein Marvi, and Morteza Zahedi, “Efficient Speech Emotion Recognition Using Modified Feature Extraction,” Informatics in Medicine Unlocked, vol. 20, pp. 1-11, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  32. Dimitrios Ververidis, and Constantine Kotropoulos, “Emotional Speech Recognition: Resources, Features, and Methods,” Speech Communication, vol. 48, no. 9, pp. 1162-1181, 2006.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  33. LeiLei Xu et al., “A Large-Scale Remote Sensing Scene Dataset Construction for Semantic Segmentation,” International Journal of Image and Data Fusion, vol. 14, no. 4, pp. 299-323, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  34. Iqbal H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Computer Science, vol. 2, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  35. Laith Alzubaidi et al., “A Survey on Deep Learning Tools Dealing with Data Scarcity: Definitions, Challenges, Solutions, Tips, and Applications,” Journal of Big Data, vol. 10, 2023.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  36. Leland Roberts, Understanding the Mel Spectrogram, Medium, 2020. [Online]. Available: https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53
  37. Sarfaraz Masood, Jeevan Singh Nayal, and Ravi Kumar Jain, “Singer Identification in Indian Hindi Songs Using MFCC and Spectral Features,” 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES), Delhi, India, pp. 1-5, 2016.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  38. G. Tzanetakis, and P. Cook, “Musical Genre Classification of Audio Signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, 2002.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  39. Somenath Bera, Vimal K. Shrivastava, and Suresh Chandra Satapathy, “Advances in Hyperspectral Image Classification Based on Convolutional Neural Networks: A Review,” CMES - Computer Modeling in Engineering and Sciences, vol. 133, no. 2, pp. 219-250, 2022.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  40. Juraj Kacur et al., “On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition,” Sensors, vol. 21, no. 5, 2021.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  41. Petri Laukka et al., “The Expression and Recognition of Emotions in the Voice across Five Nations: A Lens Model Analysis Based on Acoustic Features,” Journal of Personality and Social Psychology, vol. 111, no. 5, pp. 686-705, 2016.
    [
    CrossRef] [Google Scholar] [Publisher Link]
  42. Roza G. Kamiloğlu, Agneta H. Fischer, and Disa A. Sauter, “Good Vibrations: A Review of Vocal Expressions of Positive Emotions,” Psychonomic Bulletin & Review, vol. 27, no. 2, pp. 237-265, 2020.
    [
    CrossRef] [Google Scholar] [Publisher Link]