Autor: Kucharski, Mateusz - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Lossy coding impact on speech recognition with convolutional neural networks
Autorzy:: Kucharski, Mateusz
Powiązania:: https://bibliotekanauki.pl/articles/24201985.pdf
Data publikacji:: 2022
Wydawca:: Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:: lossy coding
convolutional neural networks
speech recognition
kodowanie stratne
konwolucyjne sieci neuronowe
rozpoznawanie mowy
Opis:: This paper presents research of lossy coding impact on speech recognition with convolutional neural networks. For this purpose, google speech commands dataset containing utterances of 30 words was encoded using four most common all-purpose codecs: mp3, aac, wma and ogg. A convolutional neural network was taught using part of the original files and later tested with the rest of the files, as well as their counterparts encoded with different codecs and bitrates. The same network model was also taught using mp3 encoded data showing the biggest loss in effectiveness of the previous network. Results show that lossy coding does have an effect on speech recognition, especially for low bitrates.
Źródło:: Vibrations in Physical Systems; 2022, 33, 3; art. no. 2022302
0860-6897
Pojawia się w:: Vibrations in Physical Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Skocz do pozycji: 2.

Tytuł:: Coding effects on changes in formant frequencies in Japanese speech signals
Autorzy:: Kucharski, Mateusz
Brachmański, Stefan
Powiązania:: https://bibliotekanauki.pl/articles/128083.pdf
Data publikacji:: 2019
Wydawca:: Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:: speech
speech coding
formants
mowa
kodowanie mowy
formanty
Opis:: This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.
Źródło:: Vibrations in Physical Systems; 2019, 30, 1; 1-8
0860-6897
Pojawia się w:: Vibrations in Physical Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Informacja