Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Szczegóły
Opis

Tytuł:: Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
Autorzy:: Kaur, G.
Srivastava, M.
Kumar, A.
Powiązania:: https://bibliotekanauki.pl/articles/958089.pdf
Data publikacji:: 2018
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: deep neural networks
genetic algorithm
LPCC
MFCC
PLP
RASTA-PLP
speaker recognition
speech recognition
Źródło:: Journal of Telecommunications and Information Technology; 2018, 2; 23-31
1509-4553
1899-8852
Język:: angielski
Prawa:: Wszystkie prawa zastrzeżone. Swoboda użytkownika ograniczona do ustawowego zakresu dozwolonego użytku
Dostawca treści:: Biblioteka Nauki
: Artykuł

Przejdź do źródła

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

Informacja

Powiązane pozycje