Temat: mfcc - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: System rozpoznawania mowy z ograniczonym słownikiem
Speech recognition system with limited dictionary
Autorzy:: Grabowski, D.
Kwiatkowska, M.
Świerczewski, Ł.
Powiązania:: https://bibliotekanauki.pl/articles/131953.pdf
Data publikacji:: 2014
Wydawca:: Wrocławska Wyższa Szkoła Informatyki Stosowanej Horyzont
Tematy:: rozpoznawanie mowy
ASR
MFCC
speech recognition
Opis:: Motywacją w pisanej pracy jest omówienie i porównanie popularnych algorytmów rozpoznawania mowy na różnych systemach. Zebrane informacje są przedstawione w stosunkowo krótkiej formie, bez wnikliwej analizy dowodów matematycznych, do których przedstawienia i tak potrzebne jest odniesienie się do odrębnych specjalistycznych źródeł. Omówione zostały tutaj problemy pewne związane z ASR (ang. Automatic Speech Recognition) i perspektywy na rozwiązanie ich. Na podstawie dostępnych rozwiązań stworzony został moduł aplikacji umożliwiający porównywanie zebranych nagrań pod kątem podobieństwa sygnału mowy i przedstawienie wyników w formie tabelarycznej. Stworzona biblioteka w celach prezentacyjnych została użyta do pełnej aplikacji umożliwiającej wykonywanie rozkazów na podstawie słów wypowiadanych do mikrofonu. Wyniki posłużą nie tyle za ostateczne wnioski w tematyce rozpoznawania mowy, co za wskazówki do kolejnych analiz i badań. Mimo postępów w badaniach nad ASR, nadal nie ma algorytmów o skuteczności przekraczającej 95%. Motywacją do dalszych działań może być np. społeczne wykluczenie ludzi nie mogących posługiwać się komunikacją polegającą na wzroku.
Motivation of this thesis is discussion about popular ASR algorithms and comparision on various architectures. Collected results are presented in relatively short shape. It’s done without math argumentation because it could depend on complicated equations. Here are discussed some problems associated with ASR (Automatic Speech Recognition) and the prospects for a solution to their. On the basis of available solutions it was developed application module that allows comparison of collected recordings in respect of similarity of the speech signal and present the results in tabular form. For presentation purposes it has been created a library and it was used in complete application that allows execution of commands based on the words spoken to microphone. The results will be used not only for the final conclusions about ASR, what clues for further analysis and research. Despite the advances in research on ASR, still there are no algorithms for effectiveness in excess of 95%. The motivation for further actions may be, eg, the social exclusion of people who can not use the communication involving the eye
Źródło:: Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka; 2014, 4; 44-53
2082-9892
Pojawia się w:: Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features
Autorzy:: Bhalke, D. G.
Rajesh, B.
Bormane, D. S.
Powiązania:: https://bibliotekanauki.pl/articles/177599.pdf
Data publikacji:: 2017
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: feature extraction
Timbral features
MFCC
Mel Frequency Cepstral Coefficient
FrFT
fractional Fourier transform
Fractional MFCC
Tamil Carnatic music
Opis:: This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
Źródło:: Archives of Acoustics; 2017, 42, 2; 213-222
0137-5075
Pojawia się w:: Archives of Acoustics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Sterowanie głosem urządzeniami mechatronicznymi koncepcja stanowiska dydaktycznego
The voice control of mechatronic devices the concept of didactic station
Autorzy:: Idziak, P.
Kmieć, A.
Powiązania:: https://bibliotekanauki.pl/articles/377555.pdf
Data publikacji:: 2017
Wydawca:: Politechnika Poznańska. Wydawnictwo Politechniki Poznańskiej
Tematy:: Raspberry Pi
algorytm MFCC
oprogramowanie Jasper
rozpoznawanie mowy
Opis:: W artykule zaprezentowano algorytmy zamiany głosu ludzkiego na postać cyfrową i na tej podstawie rozpoznawanie wydawanych komend. Przedstawiono opis algorytmu MFCC oraz jego aplikację działającą na platformie Raspberry Pi. Opisano spotykane open-source’owe programy umożliwiające rozpozanawanie mowy, działające w środowisku LINUX. Zaprezentowano koncepcję stanowiska dydaktycznego realizującego proste komendy głosowe. Przedstawiono rezultaty testów sprawdzających.
The article features basic algorithms which are responsible for converting human voice into digital form. It also describes MFCC algorithm and the steps required to put it into practice. It includes presentation of the primary open-source software programs, that allow speech recognition in Linux environment, on the platform Raspberry Pi. At the end, the article presents a concept of didactic station, performing simple voice commands using Jasper program and its possibility to use in future.
Źródło:: Poznan University of Technology Academic Journals. Electrical Engineering; 2017, 92; 375-386
1897-0737
Pojawia się w:: Poznan University of Technology Academic Journals. Electrical Engineering
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Navigation security module with real-time voice command recognition system
Autorzy:: Yagimli, M.
Kursat-Tezer, H.
Powiązania:: https://bibliotekanauki.pl/articles/258920.pdf
Data publikacji:: 2017
Wydawca:: Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:: maritime navigation
LPC
MFCC
DTW
voice command recognition
Opis:: The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.
Źródło:: Polish Maritime Research; 2017, 2; 17-26
1233-2585
Pojawia się w:: Polish Maritime Research
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Diagnostyka silnika synchronicznego oparta na analizie sygnałów akustycznych z zastosowaniem MFCC i GSDM
Diagnostics of synchronous motor based on analysis of acoustic signals with application of MFCC and GSDM
Autorzy:: Głowacz, A.
Głowacz, W.
Głowacz, Z.
Powiązania:: https://bibliotekanauki.pl/articles/1373298.pdf
Data publikacji:: 2010
Wydawca:: Sieć Badawcza Łukasiewicz - Instytut Napędów i Maszyn Elektrycznych Komel
Tematy:: maszyna elektryczna
silnik synchroniczny
diagnostyka silników elektrycznych
sygnał akustyczny
GSDM
MFCC
Opis:: The paper presents method of diagnostics of imminent failure conditions of synchronous motor. This method is based on a study of acoustic signals generated by synchronous motor. Sound recognition system is based on data processing algorithms, such as MFCC and GSDM. Software to recognize the sounds of synchronous motor was implemented. The studies were carried out for four imminent failure conditions of synchronous motor. The results confirm that the system can be useful for detecting damage and protect the motors.
Źródło:: Maszyny Elektryczne: zeszyty problemowe; 2010, 87; 185-190
0239-3646
2084-5618
Pojawia się w:: Maszyny Elektryczne: zeszyty problemowe
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: CNN and LSTM for the classification of parkinsons disease based on the GTCC and MFCC
Autorzy:: Boualoulou, Nouhaila
Drissi, Taoufiq Belhoussine
Nsiri, Benayad
Powiązania:: https://bibliotekanauki.pl/articles/30148250.pdf
Data publikacji:: 2023
Wydawca:: Polskie Towarzystwo Promocji Wiedzy
Tematy:: Parkinson's disease
voice signal
GTCC
MFCC
DWT
EMD
CNN and LSTM
Opis:: Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. This paper presents a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics. These are Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC.
Źródło:: Applied Computer Science; 2023, 19, 2; 1-24
1895-3735
2353-6977
Pojawia się w:: Applied Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Rozpoznawanie wieku i płci na podstawie analizy głosu
Age and gender recognition based on analysis of voice
Autorzy:: Gabryś, J.
Gil, G.
Kiszka, P.
Powiązania:: https://bibliotekanauki.pl/articles/261820.pdf
Data publikacji:: 2015
Wydawca:: Politechnika Wrocławska. Wydział Podstawowych Problemów Techniki. Katedra Inżynierii Biomedycznej
Tematy:: automatyczne rozpoznawanie mowy
wiek
płeć
współczynniki MFCC
klasyfikacja mówcy
maszyna wektorów nośnych
automatic speech recognition
age
gender
MFCC coefficients
classification of speaker
support vector machine (SVM)
Opis:: Metody automatycznego rozpoznawania wieku i płci pozwalają na rozpoznanie cech osoby mówiącej tylko na podstawie nagrania jej wypowiedzi. Mowa ludzka, poza werbalnym komunikatem, niesie ze sobą informacje dotyczące osoby mówiącej. Nagranie mowy osoby pozwala na wyodrębnienie takich informacji, jak jej płeć, wiek, a także emocje. Zaprezentowano przegląd metod rozpoznawania wieku i płci osób na podstawie ich mowy oraz wykonano implementację i przetestowano połączenie metod wyznaczania parametrów MFCC (współczynniki analizy cepstralnej w skali mel (Mel-frequency Cepstral Coefficients) i wysokości tonu głosu f0 oraz algorytmu SVM (metoda wektorów nośnych - Support Vector Machines) do klasyfikacji próbek głosowych. Testy zaimplementowanego rozwiązania pozwalają stwierdzić, że metoda jest skuteczna w większości przypadków testowych.
Methods for automatic recognition of the age and gender characteristics allow the identification of the person only on the basis of recording of this person speech. Human speech, beyond verbal communication, gives an information about the speaking person. Speech recording allows the identification personal characteristics such as gender, age, and the emotions. The paper presents an overview of methods of age and gender recognition of people based on their speech. A combination of methods for determining the parameters MFCC (Mel-frequency Cepstral Coefficients) and pitch of voice (f0) and SVM (Support Vector Machines) algorithm for the classification of voice samples is implanted and tested. It was demonstrated that the method is effective in the majority of test cases.
Źródło:: Acta Bio-Optica et Informatica Medica. Inżynieria Biomedyczna; 2015, 21, 3; 165-169
1234-5563
Pojawia się w:: Acta Bio-Optica et Informatica Medica. Inżynieria Biomedyczna
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: A novel Parkinsons disease detection algorithm combined EMD, BFCC, and SVM classifier
Autorzy:: Boualoulou, Nouhaila
Mounia, Miyara
Nsiri, Benayad
Behoussine Drissi, Taoufiq
Powiązania:: https://bibliotekanauki.pl/articles/27313826.pdf
Data publikacji:: 2023
Wydawca:: Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:: EMD
BFCC
MFCC
SVM
Parkinson’s disease
sztuczna sieć neuronowa
choroba Parkinsona
Opis:: Identifying and assessing Parkinson's disease in its early stages is critical to effectively monitoring the disease's progression. Methodologies based on machine learning enhanced speech analysis are gaining popularity as the potential of this field is revealed. Acoustic features, in particular, are used in a variety of algorithms for machine learning and could serve as indicators of the general health of subjects' voices. In this research paper, a novel method is introduced for the automated detection of Parkinson's disease through speech signal analysis, a support vector machines classifier (SVM) and an Artificial Neural Network (ANN) are used to evaluate and classify the data based on two acoustic features: Bark Frequency Cepstral Coefficients (BFCC) and Mel Frequency Cepstral Coefficients (MFCC). These features are extracted from the denoised signals using Empirical Mode Decomposition (EMD). The most relevant results obtained for a dataset of 38 participants are by the BFCC coefficients with an accuracy up to 92.10%. These results confirm that EMD-BFCC-SVM method can contribute to the detection of Parkinson's disease.
Źródło:: Diagnostyka; 2023, 24, 4; art. no. 2023404
1641-6414
2449-5220
Pojawia się w:: Diagnostyka
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
Autorzy:: Kaur, G.
Srivastava, M.
Kumar, A.
Powiązania:: https://bibliotekanauki.pl/articles/958089.pdf
Data publikacji:: 2018
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: deep neural networks
genetic algorithm
LPCC
MFCC
PLP
RASTA-PLP
speaker recognition
speech recognition
Opis:: Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.
Źródło:: Journal of Telecommunications and Information Technology; 2018, 2; 23-31
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Visualization of stages of determining cepstral factors in speech recognition systems
Autorzy:: Proksa, R.
Powiązania:: https://bibliotekanauki.pl/articles/333103.pdf
Data publikacji:: 2009
Wydawca:: Uniwersytet Śląski. Wydział Informatyki i Nauki o Materiałach. Instytut Informatyki. Zakład Systemów Komputerowych
Tematy:: rozpoznawanie mowy
LPCC
MFCC
wyizolowane słowo
sygnały mowy
speech recognition
cepstral coefficients
isolated word
Opis:: The article presents two methods of determination of cepstral parameters commonly applied in digital signal processing, in particular in speech recognition systems. The solutions presented are part of a project aimed at developing applications allowing to control the Windows operating system with voice and the use of MSAA (Microsoft Active Accessibility). The analysed voice signal has been visually presented at each of the crucial stages of developing cepstral coefficients.
Źródło:: Journal of Medical Informatics & Technologies; 2009, 13; 121-128
1642-6037
Pojawia się w:: Journal of Medical Informatics & Technologies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "mfcc" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język