Temat: speech detection - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset
Autorzy:: Gosztolya, G.
Beke, A.
Neuberger, T.
Tóth, L.
Powiązania:: https://bibliotekanauki.pl/articles/177910.pdf
Data publikacji:: 2016
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech recognition
speech technology
computational paralinguistics
laughter detection
deep neural networks
Opis:: Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.
Źródło:: Archives of Acoustics; 2016, 41, 4; 669-682
0137-5075
Pojawia się w:: Archives of Acoustics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Recognition of Human Emotion from a Speech Signal Based on Plutchiks Model
Autorzy:: Kamińska, D.
Pelikant, A.
Powiązania:: https://bibliotekanauki.pl/articles/227272.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: emotion detection
Plutchik's wheel of emotion
speech signal
Opis:: Machine recognition of human emotional states is an essential part in improving man-machine interaction. During expressive speech the voice conveys semantic message as well as the information about emotional state of the speaker. The pitch contour is one of the most significant properties of speech, which is affected by the emotional state. Therefore pitch features have been commonly used in systems for automatic emotion detection. In this work different intensities of emotions and their influence on pitch features have been studied. This understanding is important to develop such a system. Intensities of emotions are presented on Plutchik's cone-shaped 3D model. The k Nearest Neighbor algorithm has been used for classification. The classification has been divided into two parts. First, the primary emotion has been detected, then its intensity has been specified. The results show that the recognition accuracy of the system is over 50% for primary emotions, and over 70% for its intensities.
Źródło:: International Journal of Electronics and Telecommunications; 2012, 58, 2; 165-170
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2173587.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; art. no. e136300
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2090711.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; e136300, 1--12
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "speech detection" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język