Temat: speech coding - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Coding effects on changes in formant frequencies in Japanese speech signals
Autorzy:: Kucharski, Mateusz
Brachmański, Stefan
Powiązania:: https://bibliotekanauki.pl/articles/128083.pdf
Data publikacji:: 2019
Wydawca:: Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:: speech
speech coding
formants
mowa
kodowanie mowy
formanty
Opis:: This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.
Źródło:: Vibrations in Physical Systems; 2019, 30, 1; 1-8
0860-6897
Pojawia się w:: Vibrations in Physical Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Towards spike-based speech processing: a biologically plausible approach to simple acoustic classification
Autorzy:: Uysal, I.
Sathyendra, H.
Harris, J. G.
Powiązania:: https://bibliotekanauki.pl/articles/907947.pdf
Data publikacji:: 2008
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: kodowanie synchroniczne
blokowanie fazowe
percepcja mowy
psychoakustyka
rozpoznawanie mowy
spike coding
synchrony coding
phase locking
speech perception
psychoacoustics
speech recognition
Opis:: Shortcomings of automatic speech recognition (ASR) applications are becoming more evident as they are more widely used in real life. The inherent non-stationarity associated with the timing of speech signals as well as the dynamical changes in the environment make the ensuing analysis and recognition extremely difficult. Researchers often turn to biology seeking clues to make better engineered systems, and ASR is no exception with the usage of feature sets such as Mel frequency cepstral coefficients, which employ filter banks similar to cochlear filter banks in frequency distribution and bandwidth. In this paper, we delve deeper into the mechanics of the human auditory system to take this biological inspiration to the next level. The main goal of this research is to investigate the computation potential of spike trains produced at the early stages of the auditory system for a simple acoustic classification task. First, various spike coding schemes from temporal to rate coding are explored, together with various spike-based encoders with various simplicity levels such as rank order coding and liquid state machine. Based on these findings, a biologically plausible system architecture is proposed for the recognition of phonetically simple acoustic signals which makes exclusive use of spikes for computation. The performance tests show superior performance on a noisy vowel data set when compared with a conventional ASR system.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2008, 18, 2; 129-137
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
Autorzy:: Falkowski-Gilski, Przemysław
Debita, Grzegorz
Powiązania:: https://bibliotekanauki.pl/articles/31339787.pdf
Data publikacji:: 2023
Wydawca:: Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:: coding
communication applications
compression
signal processing
speech processing
quality of service
Opis:: In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability to interpret voice commands with adequate mean opinion score (MOS) grades. This paper describes a quality evaluation study of a two-way speech transmission system via bandwidth over power line – power line communication (BPL-PLC) technology in an operating underground mine. We investigate how different features of the available wired medium can affect end-user quality. The results of the described study include: two types of coupling (capacitive and inductive), two transmission modes (mode 1 and 11), and four language sets of speech samples (American English, British English, German, and Polish) encoded at three different bit rates (8, 16, and 24 kbps). Our findings can aid both researchers working on low-bit rate coding and compression, signal processing and speech perception, as well as professionals active in the mining and oil industry.
Źródło:: Archives of Acoustics; 2023, 48, 4; 585-592
0137-5075
Pojawia się w:: Archives of Acoustics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Incoherent Discriminative Dictionary Learning for Speech Enhancement
Autorzy:: Shaheen, D.
Dakkak, O. A.
Wainakh, M.
Powiązania:: https://bibliotekanauki.pl/articles/308116.pdf
Data publikacji:: 2018
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: ADMM
l1 minimization algorithms
sparse coding
speech enhancement
supervised dictionary learning
Opis:: Speech enhancement is one of the many challenging tasks in signal processing, especially in the case of nonstationary speech-like noise. In this paper a new incoherent discriminative dictionary learning algorithm is proposed to model both speech and noise, where the cost function accounts for both “source confusion” and “source distortion” errors, with a regularization term that penalizes the coherence between speech and noise sub-dictionaries. At the enhancement stage, we use sparse coding on the learnt dictionary to ﬁnd an estimate for both clean speech and noise amplitude spectrum. In the ﬁnal phase, the Wiener ﬁlter is used to reﬁne the clean speech estimate. Experiments on the Noizeus dataset, using two objective speech enhancement measures: frequency-weighted segmental SNR and Perceptual Evaluation of Speech Quality (PESQ) demonstrate that the proposed algorithm outperforms other speech enhancement methods tested.
Źródło:: Journal of Telecommunications and Information Technology; 2018, 3; 42-54
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Lossy coding impact on speech recognition with convolutional neural networks
Autorzy:: Kucharski, Mateusz
Powiązania:: https://bibliotekanauki.pl/articles/24201985.pdf
Data publikacji:: 2022
Wydawca:: Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:: lossy coding
convolutional neural networks
speech recognition
kodowanie stratne
konwolucyjne sieci neuronowe
rozpoznawanie mowy
Opis:: This paper presents research of lossy coding impact on speech recognition with convolutional neural networks. For this purpose, google speech commands dataset containing utterances of 30 words was encoded using four most common all-purpose codecs: mp3, aac, wma and ogg. A convolutional neural network was taught using part of the original files and later tested with the rest of the files, as well as their counterparts encoded with different codecs and bitrates. The same network model was also taught using mp3 encoded data showing the biggest loss in effectiveness of the previous network. Results show that lossy coding does have an effect on speech recognition, especially for low bitrates.
Źródło:: Vibrations in Physical Systems; 2022, 33, 3; art. no. 2022302
0860-6897
Pojawia się w:: Vibrations in Physical Systems
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "speech coding" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język