Temat: speech emotion recognition - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Automatic speech based emotion recognition using paralinguistics features
Autorzy:: Hook, J.
Noroozi, F.
Toygar, O.
Anbarjafari, G.
Powiązania:: https://bibliotekanauki.pl/articles/200261.pdf
Data publikacji:: 2019
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: random forests
speech emotion recognition
machine learning
support vector machines
lasy
rozpoznawanie emocji mowy
nauczanie maszynowe
Opis:: Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a?small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a?set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A?similar effect is seen with male speakers: the first model yields 36%, the second 28% a?verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2019, 67, 3; 479-488
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2173587.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; art. no. e136300
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2090711.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; e136300, 1--12
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "speech emotion recognition" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język