Measuring the Accuracy of A Voiceprint Analysis System Designed by Applying the Euclidean Distance Function and Genetic Algorithm

International Journal of Electrical and Electronics Engineering
© 2024 by SSRG - IJEEE Journal
Volume 11 Issue 3
Year of Publication : 2024
Authors : Asmaa Barakat, Abu Naqra, Abdul Rahman Hussian
pdf
How to Cite?

Asmaa Barakat, Abu Naqra, Abdul Rahman Hussian, "Measuring the Accuracy of A Voiceprint Analysis System Designed by Applying the Euclidean Distance Function and Genetic Algorithm," SSRG International Journal of Electrical and Electronics Engineering, vol. 11,  no. 3, pp. 220-230, 2024. Crossref, https://doi.org/10.14445/23488379/IJEEE-V11I3P118

Abstract:

This research aims to measure the accuracy of the work of the voiceprint analysis system. The system comprises three stages: (i) recording voice, deleting noise, and extracting the voiceprint, (ii) establishing the database, and (iii) comparing the data and decision-making process. The process of deleting the noise and extracting the voiceprint, in which noise deletion is the biggest challenge, is in the first stage. Next, the voice is analyzed by applying the MFCC algorithm, and then a statistical equation is utilized to extract the voiceprint. Creating a database in which the voiceprint samples are saved and comparing and making a decision by applying the Euclidean distance function and the genetic algorithm, respectively. The test results showed speakers’ recognition ratios among the user groups (10, 20, 30, 40), by applying the Euclidean distance function, are (93%, 89.5%, 82.83%, and 73.37%) respectively. The distinction was improved by adding the genetic algorithm to the Euclidean distance function for the same number of users. The results were as follows (94%, 90.75%, 83.83%, and 74.87%), respectively. The average time for voice analysis and voiceprint extraction was (3.183, 3.174, 3.171, and 3.169 sec.); the average time for testing (0.00807, 0.00808, 0.0082, and 0.0258 sec.) by applying the Euclidean distance function; and the average time for testing (0.00615, 0.023711, 0.020747, and 0.022438 sec.) by applying the Euclidean distance function and the genetic algorithm, and thus speeding up the testing and decision-making process is achieved.

Keywords:

Voiceprint, MFCC algorithm, Euclidean distance, Genetic Algorithm,CNN, ANN.

References:

[1] S.S. Wali, S.M. Hatture, and S. Nandya, “MFCC Based Text-Dependent Speaker Identification Using BPNN,” International Journal of Signal Processing Systems, vol. 3, no. 1, pp. 30-34, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Mehmet Berkehan Akcay, and Kaya Oguz, “Speech Emotion Recognition: Emotional Models, Databases, Features, Preprocessing Methods, Supporting Modalities, and Classifiers,” Speech Communication, vol. 116, pp. 56-76, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Qiang Zhu et al., “Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features,” Algorithms, vol. 15, no. 2, pp. 1-12, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Rajeshwari G. Dandage, and P.R. Badadapure, “Infant’s Cry Detection Using Linear Frequency Cepstrum Coefficients,” International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, vol. 6, no. 7, pp. 5377- 5383, 2017.
[CrossRef] [Publisher Link]
[5] W.S. Mada Sanjaya, Dyah Anggraeni, and Ikhsan Purnama Santika, “Speech Recognition Using Linear Predictive Coding (LPC) and Adaptive Neuro-Fuzzy (ANFIS) to Control 5 DoF Arm Robot,” Journal of Physics: Conference Series, vol. 1090, pp. 1-10, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Rashmi Kethireddy, Sudarsana Reddy Kadiri, and Suryakanth V. Gangashetty, “Exploration of Temporal Dynamics of Frequency Domain Linear Prediction Cepstral Coefficients for Dialect Classification,” Applied Acoustics, vol. 188, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Kranthi Kumar Lella, and Alphonse Pja, “Automatic Diagnosis of COVID-19 Disease Using Deep Convolutional Neural Network with Multi-Feature Channel from Respiratory Voice Data: Cough, Voice, and Breath,” Alexandria Engineering Journal, vol. 61, no. 2, pp. 1319-1334, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] P.K. Nayana, Dominic Mathew, and Abraham Thomas, “Comparison of Text Independent Speaker Identification Systems Using GMM and i-Vector Methods,” Procedia Computer Science, vol. 115, pp. 47-54, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Harisudha Kuresan, and Dhanalakshmi Samiappan, “Genetic Algorithm and Principal Components Analysis in Speech-Based Parkinson’s Early Diagnosis Studies,” International Journal of Nonlinear Analysis and Applications, vol. 13, no. 1, pp. 591-602, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Ismail Shahin, Ali Bou Nassif, and Noor Hindawi, “Speaker Identifcation in Stressful Talking Environments Based on Convolutional Neural Network,” International Journal of Speech Technology, vol. 24, pp. 1055-1066, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Qasim Sadiq Mahmood, and Yusra Faisal Al-Irahyim, “Text-Dependent Speaker Identification System Based on Deep Learning,” Journal of Education and Science, vol. 30, no. 4, pp. 141-160, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[12] M. Subba Rao, K. Umamaheswari, and P. Venkata Jagadeesh, “Support Vector Machine Based Automatic Speaker Recognition System,” The International Journal of Analytical and Experimental Modal Analysis, vol. 12, no. 3, pp. 1041-1049, 2020.
[CrossRef] [Publisher Link]
[13] Yinchun Chen, “A Hidden Markov Optimization Model for Processing and Recognition of English Speech Feature Signals,” Journal of Intelligent Systems, vol. 31, no. 1, pp. 716-725, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Tsung-Han Tsai, Ping-Cheng Hao, and Chiao-Li Wang, “Self-Defined Text-Dependent Wake-Up-Words Speaker Recognition System,” IEEE Access, vol. 9, pp. 138668- 138676, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Rusydi Umar et al., “Identification of Speaker Recognition for Audio Forensic Using K-Nearest Neighbor,” International Journal of Scientific & Technology Research, vol. 8, no. 11, pp. 3846- 3850, 2019.
[Google Scholar] [Publisher Link]
[16] Anett Antony, and R. Gopikakumari, “Speaker Identification Based on Combination of MFCC and UMRT Based Features,” Procedia Computer Science, vol. 143, pp. 250-257, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Soufiane Hourri, Nikola S. Nikolov, and Jamal Kharroubi, “A Deep Learning Approach to Integrate Convolutional Neural Networks in Speaker Recognition,” International Journal of Speech Technology, vol. 23, pp. 615-623, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Feng Ye, and Jun Yang, “A Deep Neural Network Model for Speaker Identification,” Applied Sciences, vol. 11, no. 8, pp. 1-18, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Samia Abd El-Moneim et al., “Text-Independent Speaker Recognition Using LSTM-RNN and Speech Enhancement,” Multimedia Tools and Applications, vol. 79, pp. 24013-24028, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Zhanghao Wu et al., “Data Augmentation Using Variational Autoencoder for Embedding-Based Speaker Verification,” Proceedings of the Interspeech 2019, pp. 1163-1167, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Yuanjun Zhao, Roberto Togneri, and Victor Sreeram, “Multitask Learning-Based Spoofing-Robust Automatic Speaker Verification System,” Circuits, Systems, and Signal Processing, vol. 41, pp. 4068-4089, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[22] K. Sreenivasa Rao, and K.E. Manjunath, Speech Recognition Using Articulatory and Excitation Source Features, Springer Briefs in Speech Technology, pp. 85-92, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Rada A., Alhalabeia O., and Mansor A., “Voice Recognition Using Neural Networks,” Thesis, Faculty of Electrical and Electronic Engineering, Aleppo University, Syria, 1999.
[24] Md. Afzal Hossan, Sheeraz Memon, and Mark A. Gregory, “A Novel Approach for MFCC Feature Extraction,” 2010 4th International Conference on Signal Processing and Communication Systems, Australia, pp. 1-5, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Jane J. Stephan, “Speaker Identification Using Evolutionary Algorithm,” Research Journal of Applied Sciences, Engineering and Technology, vol. 13, no. 9, pp. 717-721, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Sofia Kanwal, and Sohail Asghar, “Speech Emotion Recognition Using Clustering Based GA-Optimized Set,” IEEE Access, vol. 9, pp. 125830-125842, 2021.
[CrossRef] [Google Scholar] [Publisher Link]