Recurrent Neural Network based Language Modeling for Punjabi ASR

International Journal of Computer Science and Engineering
© 2020 by SSRG - IJCSE Journal
Volume 7 Issue 9
Year of Publication : 2020
Authors : Vaibhav Kumar

pdf
How to Cite?

Vaibhav Kumar, "Recurrent Neural Network based Language Modeling for Punjabi ASR," SSRG International Journal of Computer Science and Engineering , vol. 7,  no. 9, pp. 7-13, 2020. Crossref, https://doi.org/10.14445/23488387/IJCSE-V7I9P102

Abstract:

Deep Learning approaches have been widely known to perform better than statistical approaches. This is the first effort to investigate Recurrent Neural Network-based modeling for Punjabi speech corpus. We propose the Lattice Rescoring based RNNLM approach using the Kaldi toolkit. Experiments on single sentences showed that the Neural networkbased approach performs better than n-gram based modeling approaches. A performance improvement of 7-9% on word error rate (WER) was observed on top of the state-of-the-art Punjabi speech recognition system.

Keywords:

automatic speech recognition, recurrent neural network language modeling, lattice rescoring, Punjabi ASR

References:

[1] H. Xu, T. Chen, D. Gao, Y. Wang, K. Li, N. Goel, Y. Carmiel, D. Povey, and S. Khudanpur, “A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition," in Acoustics, Speech, and Signal Processing (ICASSP), 2018 IEEE International Conference on. IEEE, 2018.
[2] Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, and Sanjeev Khudanpur, “Recurrent neural network language model adaptation for conversational speech recognition,” Proc. Interspeech, 2018, pp. 3373–3377, 2018.
[3] L. Longfei, Z. Yong, J. Dongmei, Z. Yanning, W. Fengna, I. Gonzalez, et al., "Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition," in Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association Conference on, 2013, pp. 312-317.
[4] Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and Jan Cernocky. 2011. “RNNLM-recurrent neural network language modeling toolkit”. In Proceedings of the 2011 ASRU Workshop pages 196–201.
[5] H. Sak, M. Saraçlar, and T. Güngör, “On-the-fly lattice rescoring for real-time automatic speech recognition,” in Proc. Interspeech, 2010, pp. 2450–2453.
[6] F. Seide, G. Li, X. Chen, and D. Yu, “Feature engineering in context-dependent deep neural networks for conversational speech transcription,” in Proc. IEEE ASRU, 2011, pp. 24–29.
[7] Jozefowicz, Rafal, Vinyals, Oriol, Schuster, Mike, Shazeer, Noam, and Wu, Yonghui. “Exploring the limits of language modeling”. arXiv preprint arXiv:1602.02410, 2016.
[8] X. Chen, T. Tan, X. Liu, P. Lanchantin, M. Wan, M. J. Gales, and P. C. Woodland, “Recurrent neural network language model adaptation for multi-genre broadcast speech recognition,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015
[9] M. Bod' en, "A guide to recurrent neural networks and backpropagation,” in In the Dallas project, SICS Technical Report T2002:03, SICS, 2002.
[10] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., “The kaldi speech recognition toolkit,” in IEEE 2011 workshop on automatic speech recognition and understanding, no. EPFLCONF-192584. IEEE Signal Processing Society, 2011.
[11] V Kadyan, A Mantri, RK Aggarwal, A Singh, “A comparative study of deep neural network-based Punjabi-ASR system” in International Journal of Speech Technology 22 (1), 111-119.
[12] H. Sak, Hasim, Senior, Andrew, and Beaufays, Francoise. “Long short-term memory recurrent neural network architectures for large scale acoustic modeling”. In Interspeech, 2014.
[13] Robinson, Tony, Hochberg, Mike, and Renals, Steve. “The use of recurrent neural networks in continuous speech recognition”. pp. 253–258, 1996.
[14] N. Morgan, “Deep and wide: Multiple layers in automatic speech recognition,” IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, Jan. 2012, pp. 7–13.
[15] Parthasarathi, S. H. K., Hoffmeister, B., Matsoukas, S., Mandal, A., Strom, N., & Garimella, S. (2015). “fMLLR based feature space speaker adaptation of DNN acoustic models”. In the Sixteenth annual conference of the international speech communication association
[16] Sivasankaran, S., Nugraha, A. A., Vincent, E., Morales-Cordovilla, J. A., Dalmia, S., Illina, I., et al. (2015). “Robust ASR using neural network-based speech enhancement and feature simulation. In IEEE workshop on automatic speech recognition and understanding (ASRU)”, 2015 (pp. 482–489).
[17] T. Mikolov, I. Sutskever, A. Deoras, H. S. Le, S. Kombrink, and J. Cernock' y, “Compression of Language Models Using Subword N ˇ Neural Networks,” in Submitted to ICASSP, 2012.
[18] I. Sutskever, J. Martens, and G. Hinton, “Generating Text with Recurrent Neural Networks,” in Proceedings of ICML, 2011.
[19] Y. Bengio, R. Ducharme, P. Vincent et al., “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, pp. 1137–1155, 2003.
[20] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.
[21] M. Sundermeyer, R. Schluter, and H. Ney, “Lstm neural networks ¨ for language modeling,” in Thirteenth Annual Conference of the International Speech Communication Association, 2012.
[22] X. Chen, A. Ragni, X. Liu, and M. J. Gales, “Investigating bidirectional recurrent neural network language models for speech recognition,” Proc. ICSA INTERSPEECH, 2017.
[23] Ronald J. Williams and Jing Peng, “An efficient gradient-based algorithm for online training of recurrent network trajectories,” Neural Computation, vol. 2, pp. 490–501, 1990.
[24] Yoshua Bengio, Patrice Simard, and Paolo Frasconi, "Learning long-term dependencies with gradient descent are difficult," Neural Networks, IEEE Transactions on, vol. 5, no. 2, pp. 157–166, 1994.
[25] B. Roark, M. Sarac¸lar, and M. Collins, “Discriminative n-gram language modeling,” Computer Speech and Language, vol. 21, no. 2, pp. 373–392, April 2007
[26] S. Mittal, R. Kaur, "Implementation of phonetic level speech recognition system for Punjabi language", 2016 1st India International Conference on Information Processing (IICIP), pp. 1-6, 2016.
[27] Prashanth Kannadaguli and Vidya Bhat, "Paper Title" SSRG International Journal of Electronics and Communication Engineering1.9 (2014): 1-4.