Uttered Kurdish digit recognition system
DOI:
https://doi.org/10.26750/paperKeywords:
Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM.Abstract
Speech recognition is a crucial subject in human computer interaction area. The ability of a machine to recognize words and phrases in spoken language is speech recognition and then convert them to a machine-readable format. Digit recognition is a part of the speech recognition system. In this paper, three spectral based features including Mel Frequency Cepstral Coefficient (MFCC), Linear predictive coding (LPC) and formant frequencies are proposed to classify ten Kurdish uttered digits (0-9). The features are extracted from entire speech signal, and feed a pairwise SVM classifier. Experiments including each individual feature and different forms of fusion are conducted and the results are shown. The fusion of the features significantly improves the result and shows that the different features carry complementary information. The proposed model is experimented on the dataset that have been collected in Kurdistan.
Key words: Speech recognition, MFCC, LPC, Formant frequencies, uttered digits, SVM
References
Abdul, Z.K., 2019. Kurdish speaker identification based on one dimensional convolu- tional neural network 7, 566–572.
Al-Talabani, A., Abdul, Z., Ameen, A., 2017. Kurdish Dialects and Neighbor Languages Automatic Recognition. ARO-The Sci. J. Koya Univ. 5, 20–23.
Bilginer Gülmezoǧlu, M., 1999. A Novel Approach to Isolated Word Recognition. IEEE Trans. Speech Audio Process. 7, 620–627.
Dave, N., 2013. Feature Extraction Methods LPC, PLP and MFCC 1, 1–5.
Furui, S., 1991. Speaker-dependent-feature extraction, recognition and processing techniques. Speech Commun. 10, 505–520.
Gaikwad, S.K., Gawali, B.W., Yannawar, P., 2010. A Review on Speech Recognition Technique. Int. J. Comput. Appl. 10, 16–24.
Gupta, S., Jaafar, J., wan Ahmad, W.F., Bansal, A., 2013. Feature Extraction Using Mfcc. Signal Image Process. An Int. J. 4, 101–108.
Lee, K.F., Hon, H.W., 1989. Speaker-Independent Phone Recognition Using Hidden Markov Models. IEEE Trans. Acoust. 37, 1641–1648.
Muda, L., Begam, M., Elamvazuthi, I., 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv Prepr. arXiv1003.4083.
Nath, S.S., Mishra, G., Kar, J., Chakraborty, S., Dey, N., 2014. A survey of image classification methods and techniques. 2014 Int. Conf. Control. Instrumentation, Commun. Comput. Technol. ICCICCT 2014 554–557.
Sakoe, H., Isotani, R., Yoshida, K., Iso, K. ichi, Watanabe, T., 1989. Speaker-independent word recognition using dynamic programming neural networks. ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc. 1, 29–32.
Thiang, D.W., 2007. Implementation of speech recognition on MCS51 microcontroller for controlling wheelchair. In: International Conference on Intelligent and Advanced Systems.
Thiang, T., Wijaya, D., 2009. Limited speech recognition for controlling movement of mobile robot implemented on ATmega162 microcontroller. Proc. - 2009 Int. Conf. Comput. Autom. Eng. ICCAE 2009 347–350.
Wijoyo, Suryo, Wijoyo, S, 2011. Speech recognition using linear predictive coding and artificial neural network for controlling movement of mobile robot. In: Proceedings of 2011 International Conference on Information and Electronics Engineering (ICIEE 2011). pp. 28–29.
Yu, D., Deng, L., 2016. AUTOMATIC SPEECH RECOGNITION. Springer.