Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "MFCC" wg kryterium: Temat


Tytuł:
A novel Parkinsons disease detection algorithm combined EMD, BFCC, and SVM classifier
Autorzy:
Boualoulou, Nouhaila
Mounia, Miyara
Nsiri, Benayad
Behoussine Drissi, Taoufiq
Powiązania:
https://bibliotekanauki.pl/articles/27313826.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:
EMD
BFCC
MFCC
SVM
Parkinson’s disease
sztuczna sieć neuronowa
choroba Parkinsona
Opis:
Identifying and assessing Parkinson's disease in its early stages is critical to effectively monitoring the disease's progression. Methodologies based on machine learning enhanced speech analysis are gaining popularity as the potential of this field is revealed. Acoustic features, in particular, are used in a variety of algorithms for machine learning and could serve as indicators of the general health of subjects' voices. In this research paper, a novel method is introduced for the automated detection of Parkinson's disease through speech signal analysis, a support vector machines classifier (SVM) and an Artificial Neural Network (ANN) are used to evaluate and classify the data based on two acoustic features: Bark Frequency Cepstral Coefficients (BFCC) and Mel Frequency Cepstral Coefficients (MFCC). These features are extracted from the denoised signals using Empirical Mode Decomposition (EMD). The most relevant results obtained for a dataset of 38 participants are by the BFCC coefficients with an accuracy up to 92.10%. These results confirm that EMD-BFCC-SVM method can contribute to the detection of Parkinson's disease.
Źródło:
Diagnostyka; 2023, 24, 4; art. no. 2023404
1641-6414
2449-5220
Pojawia się w:
Diagnostyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features
Autorzy:
Bhalke, D. G.
Rajesh, B.
Bormane, D. S.
Powiązania:
https://bibliotekanauki.pl/articles/177599.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
feature extraction
Timbral features
MFCC
Mel Frequency Cepstral Coefficient
FrFT
fractional Fourier transform
Fractional MFCC
Tamil Carnatic music
Opis:
This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
Źródło:
Archives of Acoustics; 2017, 42, 2; 213-222
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Classification of Parkinson’s disease and other neurological disorders using voice features extraction and reduction techniques
Klasyfikacja choroby Parkinsona i innych zaburzeń neurologicznych z wykorzystaniem ekstrakcji cech głosowych i technik redukcji
Autorzy:
Majdoubi, Oumaima
Benba, Achraf
Hammouch, Ahmed
Powiązania:
https://bibliotekanauki.pl/articles/27315435.pdf
Data publikacji:
2023
Wydawca:
Politechnika Lubelska. Wydawnictwo Politechniki Lubelskiej
Tematy:
voice analysis
Parkinson’s disease
MFCC
PCA
naive Bayes kernel
machine learning
analiza głosu
choroba Parkinsona
naiwne jądro bayesowskie
uczenie maszynowe
Opis:
This study aimed to differentiate individuals with Parkinson's disease (PD) from those with other neurological disorders (ND) by analyzing voice samples, considering the association between voice disorders and PD. Voice samples were collected from 76 participants using different recording devices and conditions, with participants instructed to sustain the vowel /a/ comfortably. PRAAT software was employed to extract features including autocorrelation (AC), cross-correlation (CC), and Mel frequency cepstral coefficients (MFCC) from the voice samples. Principal component analysis (PCA) was utilized to reduce the dimensionality of the features. Classification Tree (CT), Logistic Regression, Naive Bayes (NB), Support Vector Machines (SVM), and Ensemble methods were employed as supervised machine learning techniques for classification. Each method provided distinct strengths and characteristics, facilitating a comprehensive evaluation of their effectiveness in distinguishing PD patients from individuals with other neurological disorders. The Naive Bayes kernel, using seven PCA-derived components, achieved the highest accuracy rate of 86.84% among the tested classification methods. It is worth noting that classifier performance may vary based on the dataset and specific characteristics of the voice samples. In conclusion, this study demonstrated the potential of voice analysis as a diagnostic tool for distinguishing PD patients from individuals with other neurological disorders. By employing a variety of voice analysis techniques and utilizing different machine learning algorithms, including Classification Tree, Logistic Regression, Naive Bayes, Support Vector Machines, and Ensemble methods, a notable accuracy rate was attained. However, further research and validation using larger datasets are required to consolidate and generalize these findings for future clinical applications.
Przedstawione badanie miało na celu różnicowanie osób z chorobą Parkinsona (PD) od osób z innymi zaburzeniami neurologicznymi poprzez analizę próbek głosowych, biorąc pod uwagę związek między zaburzeniami głosu a PD. Próbki głosowe zostały zebrane od 76 uczestników przy użyciu różnych urządzeń i warunków nagrywania, a uczestnicy byli instruowani, aby wydłużyć samogłoskę /a/ w wygodnym tempie. Oprogramowanie PRAAT zostało zastosowane do ekstrakcji cech, takich jak autokorelacja (AC), krzyżowa korelacja (CC) i współczynniki cepstralne Mel (MFCC) z próbek głosowych. Analiza składowych głównych (PCA) została wykorzystana w celu zmniejszenia wymiarowości cech. Jako techniki nadzorowanego uczenia maszynowego wykorzystano drzewa decyzyjne (CT), regresję logistyczną, naiwny klasyfikator Bayesa (NB), maszyny wektorów nośnych (SVM) oraz metody zespołowe. Każda z tych metod posiadała swoje unikalne mocne strony i charakterystyki, umożliwiając kompleksową ocenę ich skuteczności w rozróżnianiu pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Naiwny klasyfikator Bayesa, wykorzystujący siedem składowych PCA, osiągnął najwyższy wskaźnik dokładności na poziomie 86,84% wśród przetestowanych metod klasyfikacji. Należy jednak zauważyć, że wydajność klasyfikatora może się różnić w zależności od zbioru danych i konkretnych cech próbek głosowych. Podsumowując, to badanie wykazało potencjał analizy głosu jako narzędzia diagnostycznego do rozróżniania pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Poprzez zastosowanie różnych technik analizy głosu i wykorzystanie różnych algorytmów uczenia maszynowego, takich jak drzewa decyzyjne, regresja logistyczna, naiwny klasyfikator Bayesa, maszyny wektorów nośnych i metody zespołowe, osiągnięto znaczący poziom dokładności. Niemniej jednak, konieczne są dalsze badania i walidacja na większych zbiorach danych w celu skonsolidowania i uogólnienia tych wyników dla przyszłych zastosowań klinicznych.
Źródło:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska; 2023, 13, 3; 16--22
2083-0157
2391-6761
Pojawia się w:
Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
CNN and LSTM for the classification of parkinsons disease based on the GTCC and MFCC
Autorzy:
Boualoulou, Nouhaila
Drissi, Taoufiq Belhoussine
Nsiri, Benayad
Powiązania:
https://bibliotekanauki.pl/articles/30148250.pdf
Data publikacji:
2023
Wydawca:
Polskie Towarzystwo Promocji Wiedzy
Tematy:
Parkinson's disease
voice signal
GTCC
MFCC
DWT
EMD
CNN and LSTM
Opis:
Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. This paper presents a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics. These are Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC.
Źródło:
Applied Computer Science; 2023, 19, 2; 1-24
1895-3735
2353-6977
Pojawia się w:
Applied Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Comparison of the efficiency of time and frequency domain descriptors for the classification of selected wind instruments
Porównanie skuteczności deskryptorów wdziedzinie czasu i częstotliwości do klasyfikacji wybranych instrumentów dętych
Autorzy:
Tyburek, Krzysztof
Namli, Ömer Bora
Powiązania:
https://bibliotekanauki.pl/articles/41205950.pdf
Data publikacji:
2022
Wydawca:
Uniwersytet Kazimierza Wielkiego w Bydgoszczy
Tematy:
power spectrum
MFCC
timbre
Music Instrument Identification
MPEG-7
aerophones
widmo mocy
barwa
identyfikacja instrumentów muzycznych
aerofony
Opis:
By analyzing the physical features of the time domain and the frequency domainof the audio signal, it is possible to determine its source and use appropriate algorithms to automatically classify of it. The issue of sound indexing deals with the analysis ofdifferent classes and sources -including signals from musical instruments. By calculating the values of descriptors and classifying them, we obtain information about the type of instrument and its structure -most often the material from which it was made. During the conducted research, it turned out that a different composition of the feature vector is implemented to describe brass instruments and a different one for wooden instruments. In this case, the key feature may be harmonic highs in the frequency domain. The conducted experiments concern an attempt to parameterize wind instruments (aerophones) in order to compare the classification effectiveness of time and spectral descriptors. Sounds from a tube, a flute and a soprano saxophone were used for research. The sample population for each instrument was 21.
Analizując fizyczne cechy domeny czasu i domeny częstotliwości sygnału audio można okreslić jego źródło i przy pomocy własciwych algorytmów dokonac jego automatycznej klasyfikacji. Kwestia indeksacji dźwięku dotyczy analizy różnych klas i źródeł –także sygnałów wywodzących się z instrumentów muzycznych. Obliczając wartości deskryptorów i dokonując ich klasyfikacji uzyskujemy informację o typie instrumentu oraz jego budowie -najczęściej materiału, z którego zostal wykonany. Podczas prowadzonych badań okazało się, że różna kompozycja wektora cech jest implementowana do opisu instrumentów blaszanych oraz inna dla instrumentów drewnianych. W tym przypadku cechą kluczową mogą być składowe wyże harmoniczne w postaci częstotliwościowej dźwieku. Przeprowadzone eksperymenty dotyczą próby parametryzacji instrumentów dętych (aerofonów) w celu porównania skuteczności klasyfikacyjnej deskryptorów czasowych i widmowych. Do badań przeznaczono dźwieki pochodzace z tuby, fletu oraz saksofonu sopranowego. Populacja próbek dla każdego instrumentu wynosiła 21.
Źródło:
Studia i Materiały Informatyki Stosowanej; 2022, 14, 3; 13-19
1689-6300
Pojawia się w:
Studia i Materiały Informatyki Stosowanej
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Diagnostyka silnika synchronicznego oparta na analizie sygnałów akustycznych z zastosowaniem MFCC i GSDM
Diagnostics of synchronous motor based on analysis of acoustic signals with application of MFCC and GSDM
Autorzy:
Głowacz, A.
Głowacz, W.
Głowacz, Z.
Powiązania:
https://bibliotekanauki.pl/articles/1373298.pdf
Data publikacji:
2010
Wydawca:
Sieć Badawcza Łukasiewicz - Instytut Napędów i Maszyn Elektrycznych Komel
Tematy:
maszyna elektryczna
silnik synchroniczny
diagnostyka silników elektrycznych
sygnał akustyczny
GSDM
MFCC
Opis:
The paper presents method of diagnostics of imminent failure conditions of synchronous motor. This method is based on a study of acoustic signals generated by synchronous motor. Sound recognition system is based on data processing algorithms, such as MFCC and GSDM. Software to recognize the sounds of synchronous motor was implemented. The studies were carried out for four imminent failure conditions of synchronous motor. The results confirm that the system can be useful for detecting damage and protect the motors.
Źródło:
Maszyny Elektryczne: zeszyty problemowe; 2010, 87; 185-190
0239-3646
2084-5618
Pojawia się w:
Maszyny Elektryczne: zeszyty problemowe
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Discrimination between patients with CVDs and healthy people by voiceprint using the MFCC and pitch
Autorzy:
Bourouhou, Abdelhamid
Jilbab, Abdelilah
Cherti, Mohammed
Bourouhou, Zaineb
Nacir, Chafik
Powiązania:
https://bibliotekanauki.pl/articles/2096170.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:
cardiovascular diseases
speech analysis
voiceprint
MFCC
K-near-neighbor classifier
choroby układu krążenia
analiza mowy
Opis:
Heart diseases cause many deaths around the world every year, and his death rate makes the leader of the killer diseases. But early diagnosis can be helpful to decrease those several deaths and save lives. To ensure good diagnose, people must pass a series of clinical examinations and analyses, which make the diagnostic operation expensive and not accessible for everyone. Speech analysis comes as a strong tool which can resolve the task and give back a new way to discriminate between healthy people and person with cardiovascular diseases. Our latest paper treated this task but using a dysphonia measurement to differentiate between people with cardiovascular disease and the healthy one, and we were able to reach 81.5% in prediction accuracy. This time we choose to change the method to increase the accuracy by extracting the voiceprint using 13 Mel-Frequency Cepstral Coefficients and the pitch, extracted from the people's voices provided from a database which contain 75 subjects (35 has cardiovascular diseases, 40 are healthy), three records of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii….) has been collected from each one. We used the k-near-neighbor classifier to train a model and to classify the test entities. We were able to outperform the previous results, reaching 95.55% of prediction accuracy.
Źródło:
Diagnostyka; 2021, 22, 4; 9-16
1641-6414
2449-5220
Pojawia się w:
Diagnostyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Effect of Time-domain Windowing on Isolated Speech Recognition System Performance
Autorzy:
Ananthakrishna, Thalengala
Anitha, H.
Girisha, T.
Powiązania:
https://bibliotekanauki.pl/articles/2055228.pdf
Data publikacji:
2022
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
hidden Markov model
HMM
isolated speech recognition system
ISR
Kannada language
mono-phone model
Mel frequency cepstral coefficients
MFCC
Opis:
Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.
Źródło:
International Journal of Electronics and Telecommunications; 2022, 68, 1; 161--166
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Ekstrakcja parametrów z próbek danych biometrycznych
Extraction of parameters from biometric data samples
Autorzy:
Danek, Paweł
Ćwirta, Krzysztof
Kopniak, Piotr
Powiązania:
https://bibliotekanauki.pl/articles/98390.pdf
Data publikacji:
2019
Wydawca:
Politechnika Lubelska. Instytut Informatyki
Tematy:
biometria
odcisk
głos
autoryzacja
normalizacja
gabor
deskryptor
lpc
mfcc
biometrics
fingerprint
voice
authorization
normalization
descriptor
Opis:
W artykule opisano możliwe sposoby ekstrakcji parametrów z próbek danych biometrycznych, takich jak odcisk palca czy nagranie głosu. Zweryfikowano wpływ konkretnych sposobów obróbki na skuteczność algorytmów obróbki próbek biometrycznych oraz ich porównania. Wykonano badania polegające na przetworzeniu dużej liczby próbek z użyciem wybranych algorytmów. W przypadku odcisku palca wykorzystano normalizację obrazu, filtr Gabora i porównanie z użyciem deskryptorów. Dla autoryzacji głosowej analizowano algorytmy LPC i MFCC. W przypadku obu rodzajów autoryzacji uzyskano zadowalającą skuteczność rzędu 60-80%.
This article describes possible ways to extract parameters from biometric data samples, such as fingerprint or voice recording. Influence of particular approaches to biometric sample preparation and comparision algorithms accuracy was verified. Experiment involving processing big ammount of samples with usage of particular algorithms was performed. In fingerprint detection case the image normalization, Gabor filtering and comparision method based on descriptors were used. For voice authorization LPC and MFCC alghoritms were used. In both cases satisfying accuracy (60-80%) was the result of the surveys.
Źródło:
Journal of Computer Sciences Institute; 2019, 13; 323-331
2544-0764
Pojawia się w:
Journal of Computer Sciences Institute
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
Autorzy:
Kaur, G.
Srivastava, M.
Kumar, A.
Powiązania:
https://bibliotekanauki.pl/articles/958089.pdf
Data publikacji:
2018
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
deep neural networks
genetic algorithm
LPCC
MFCC
PLP
RASTA-PLP
speaker recognition
speech recognition
Opis:
Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
Źródło:
Journal of Telecommunications and Information Technology; 2018, 2; 23-31
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning
Autorzy:
Usman, Mohammed
Zubair, Mohammed
Ahmad, Zeeshan
Zaidi, Monji
Ijyas, Thafasal
Parayangat, Muneer
Wajid, Mohd
Shiblee, Mohammad
Ali, Syed Jaffar
Powiązania:
https://bibliotekanauki.pl/articles/1953514.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:
heart rate from speech
machine learning
MFCC
regression
classification
speech as a biomedical signal
Opis:
Measurement of vital signs of the human body such as heart rate, blood pressure, body temperature and respiratory rate is an important part of diagnosing medical conditions and these are usually measured using medical equipment. In this paper, we propose to estimate an important vital sign – heart rate from speech signals using machine learning algorithms. Existing literature, observation and experience suggest the existence of a correlation between speech characteristics and physiological, psychological as well as emotional conditions. In this work, we estimate the heart rate of individuals by applying machine learning based regression algorithms to Mel frequency cepstrum coefficients, which represent speech features in the spectral domain as well as the temporal variation of spectral features. The estimated heart rate is compared with actual measurement made using a conventional medical device at the time of recording speech. We obtain estimation accuracy close to 94% between the estimated and actual measured heart rate values. Binary classification of heart rate as ‘normal’ or ‘abnormal’ is also achieved with 100% accuracy. A comparison of machine learning algorithms in terms of heart rate estimation and classification accuracy is also presented. Heart rate measurement using speech has applications in remote monitoring of patients, professional athletes and can facilitate telemedicine.
Źródło:
Archives of Acoustics; 2021, 46, 1; 41-53
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Hierarchical Classification of Environmental Noise Sources Considering the Acoustic Signature of Vehicle Pass-Bys
Autorzy:
Valero, X.
Alias, F.
Powiązania:
https://bibliotekanauki.pl/articles/176616.pdf
Data publikacji:
2012
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
acoustic signature
environmental noise monitoring
Gaussian mixture models
hierarchical classification
mel-frequency cepstral coefficients (MFCC)
sound classification
traffic noise
vehicle pass-by
Opis:
This work is focused on the automatic recognition of environmental noise sources that affect humans’ health and quality of life, namely industrial, aircraft, railway and road traffic. However, the recognition of the latter, which have the largest influence on citizens’ daily lives, is still an open issue. Therefore, although considering all the aforementioned noise sources, this paper especially focuses on improving the recognition of road noise events by taking advantage of the perceived noise differences along the road vehicle pass-by (which may be divided into different phases: approaching, passing and receding). To that effect, a hierarchical classification scheme that considers these phases independently has been implemented. The proposed classification scheme yields an averaged classification accuracy of 92.5%, which is, in absolute terms, 3% higher than the baseline (a traditional flat classification scheme without hierarchical structure). In particular, it outperforms the baseline in the classification of light and heavy vehicles, yielding a classification accuracy 7% and 4% higher, respectively. Finally, listening tests are performed to compare the system performance with human recognition ability. The results reveal that, although an expert human listener can achieve higher recognition accuracy than the proposed system, the latter outperforms the non-trained listener in 10% in average.
Źródło:
Archives of Acoustics; 2012, 37, 4; 423-434
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Hybrid of neural networks and hidden Markov models as a modern approach to speech recognition systems
Hybryda sieci neuronowych i ukrytych modeli Markowa jako nowoczesne podejście do rozpoznawania mowy
Autorzy:
Sokólski, P.
Rutkowski, T.
Powiązania:
https://bibliotekanauki.pl/articles/276753.pdf
Data publikacji:
2013
Wydawca:
Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:
sztuczne sieci neuronowe
ukryte modele Markowa
MFCC
sterowanie
artificial neural networks
hidden Markov models
speech recognition
control
Opis:
The aim of this paper is to present a hybrid algorithm that combines the advantages of artificial neural networks and hidden Markov models in speech recognition for control purposes. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov models (HMM). The main part of the paper consists of a description of development and implementation of a hybrid algorithm of speech recognition using NN and HMM and presentation of verification of correctness results.
Celem artykułu jest przedstawienie algorytmów hybrydowych łączących zalety sztucznych sieci neuronowych i ukrytych modeli Markowa w zastosowaniach rozpoznawania mowy dla potrzeb sterowania. W zakres opracowania wchodzi przegląd stosowanych obecnie rozwiązań, opis i analiza implementacji wybranych struktur sieci neuronowych (NN) oraz ukrytych modeli Markowa (HMM). Główną część artykułu stanowi opis opracowywania hybrydowego algorytmu rozpoznawania mowy wykorzystującego NN i HMM oraz prezentacja wyników weryfikacji poprawności działania.
Źródło:
Pomiary Automatyka Robotyka; 2013, 17, 2; 449-455
1427-9126
Pojawia się w:
Pomiary Automatyka Robotyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Hybridisation of Mel Frequency Cepstral Coefficient and Higher Order Spectral Features for Musical Instruments Classification
Autorzy:
Bhalke, D. G.
Rama Rao, C. B.
Bormane, D.
Powiązania:
https://bibliotekanauki.pl/articles/176497.pdf
Data publikacji:
2016
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
feature extraction
MFCC
HOS
bispectrum
bicoherence
non-linearity
non-Gaussianity
CPNN
zero crossing rate (ZCR)
Opis:
This paper presents the classification of musical instruments using Mel Frequency Cepstral Coefficients (MFCC) and Higher Order Spectral features. MFCC, cepstral, temporal, spectral, and timbral features have been widely used in the task of musical instrument classification. As music sound signal is generated using non-linear dynamics, non-linearity and non-Gaussianity of the musical instruments are important features which have not been considered in the past. In this paper, hybridisation of MFCC and Higher Order Spectral (HOS) based features have been used in the task of musical instrument classification. HOS-based features have been used to provide instrument specific information such as non-Gaussianity and non-linearity of the musical instruments. The extracted features have been presented to Counter Propagation Neural Network (CPNN) to identify the instruments and their family. For experimentation, isolated sounds of 19 musical instruments have been used from McGill University Master Sample (MUMS) sound database. The proposed features show the significant improvement in the classification accuracy of the system.
Źródło:
Archives of Acoustics; 2016, 41, 3; 427-436
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Navigation security module with real-time voice command recognition system
Autorzy:
Yagimli, M.
Kursat-Tezer, H.
Powiązania:
https://bibliotekanauki.pl/articles/258920.pdf
Data publikacji:
2017
Wydawca:
Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:
maritime navigation
LPC
MFCC
DTW
voice command recognition
Opis:
The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.
Źródło:
Polish Maritime Research; 2017, 2; 17-26
1233-2585
Pojawia się w:
Polish Maritime Research
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies