Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "MFCC" wg kryterium: Temat


Tytuł:
System rozpoznawania mowy z ograniczonym słownikiem
Speech recognition system with limited dictionary
Autorzy:
Grabowski, D.
Kwiatkowska, M.
Świerczewski, Ł.
Powiązania:
https://bibliotekanauki.pl/articles/131953.pdf
Data publikacji:
2014
Wydawca:
Wrocławska Wyższa Szkoła Informatyki Stosowanej Horyzont
Tematy:
rozpoznawanie mowy
ASR
MFCC
speech recognition
Opis:
Motywacją w pisanej pracy jest omówienie i porównanie popularnych algorytmów rozpoznawania mowy na różnych systemach. Zebrane informacje są przedstawione w stosunkowo krótkiej formie, bez wnikliwej analizy dowodów matematycznych, do których przedstawienia i tak potrzebne jest odniesienie się do odrębnych specjalistycznych źródeł. Omówione zostały tutaj problemy pewne związane z ASR (ang. Automatic Speech Recognition) i perspektywy na rozwiązanie ich. Na podstawie dostępnych rozwiązań stworzony został moduł aplikacji umożliwiający porównywanie zebranych nagrań pod kątem podobieństwa sygnału mowy i przedstawienie wyników w formie tabelarycznej. Stworzona biblioteka w celach prezentacyjnych została użyta do pełnej aplikacji umożliwiającej wykonywanie rozkazów na podstawie słów wypowiadanych do mikrofonu. Wyniki posłużą nie tyle za ostateczne wnioski w tematyce rozpoznawania mowy, co za wskazówki do kolejnych analiz i badań. Mimo postępów w badaniach nad ASR, nadal nie ma algorytmów o skuteczności przekraczającej 95%. Motywacją do dalszych działań może być np. społeczne wykluczenie ludzi nie mogących posługiwać się komunikacją polegającą na wzroku.
Motivation of this thesis is discussion about popular ASR algorithms and comparision on various architectures. Collected results are presented in relatively short shape. It’s done without math argumentation because it could depend on complicated equations. Here are discussed some problems associated with ASR (Automatic Speech Recognition) and the prospects for a solution to their. On the basis of available solutions it was developed application module that allows comparison of collected recordings in respect of similarity of the speech signal and present the results in tabular form. For presentation purposes it has been created a library and it was used in complete application that allows execution of commands based on the words spoken to microphone. The results will be used not only for the final conclusions about ASR, what clues for further analysis and research. Despite the advances in research on ASR, still there are no algorithms for effectiveness in excess of 95%. The motivation for further actions may be, eg, the social exclusion of people who can not use the communication involving the eye
Źródło:
Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka; 2014, 4; 44-53
2082-9892
Pojawia się w:
Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features
Autorzy:
Bhalke, D. G.
Rajesh, B.
Bormane, D. S.
Powiązania:
https://bibliotekanauki.pl/articles/177599.pdf
Data publikacji:
2017
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
feature extraction
Timbral features
MFCC
Mel Frequency Cepstral Coefficient
FrFT
fractional Fourier transform
Fractional MFCC
Tamil Carnatic music
Opis:
This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
Źródło:
Archives of Acoustics; 2017, 42, 2; 213-222
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Navigation security module with real-time voice command recognition system
Autorzy:
Yagimli, M.
Kursat-Tezer, H.
Powiązania:
https://bibliotekanauki.pl/articles/258920.pdf
Data publikacji:
2017
Wydawca:
Politechnika Gdańska. Wydział Inżynierii Mechanicznej i Okrętownictwa
Tematy:
maritime navigation
LPC
MFCC
DTW
voice command recognition
Opis:
The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.
Źródło:
Polish Maritime Research; 2017, 2; 17-26
1233-2585
Pojawia się w:
Polish Maritime Research
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Diagnostyka silnika synchronicznego oparta na analizie sygnałów akustycznych z zastosowaniem MFCC i GSDM
Diagnostics of synchronous motor based on analysis of acoustic signals with application of MFCC and GSDM
Autorzy:
Głowacz, A.
Głowacz, W.
Głowacz, Z.
Powiązania:
https://bibliotekanauki.pl/articles/1373298.pdf
Data publikacji:
2010
Wydawca:
Sieć Badawcza Łukasiewicz - Instytut Napędów i Maszyn Elektrycznych Komel
Tematy:
maszyna elektryczna
silnik synchroniczny
diagnostyka silników elektrycznych
sygnał akustyczny
GSDM
MFCC
Opis:
The paper presents method of diagnostics of imminent failure conditions of synchronous motor. This method is based on a study of acoustic signals generated by synchronous motor. Sound recognition system is based on data processing algorithms, such as MFCC and GSDM. Software to recognize the sounds of synchronous motor was implemented. The studies were carried out for four imminent failure conditions of synchronous motor. The results confirm that the system can be useful for detecting damage and protect the motors.
Źródło:
Maszyny Elektryczne: zeszyty problemowe; 2010, 87; 185-190
0239-3646
2084-5618
Pojawia się w:
Maszyny Elektryczne: zeszyty problemowe
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
CNN and LSTM for the classification of parkinsons disease based on the GTCC and MFCC
Autorzy:
Boualoulou, Nouhaila
Drissi, Taoufiq Belhoussine
Nsiri, Benayad
Powiązania:
https://bibliotekanauki.pl/articles/30148250.pdf
Data publikacji:
2023
Wydawca:
Polskie Towarzystwo Promocji Wiedzy
Tematy:
Parkinson's disease
voice signal
GTCC
MFCC
DWT
EMD
CNN and LSTM
Opis:
Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. This paper presents a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics. These are Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC.
Źródło:
Applied Computer Science; 2023, 19, 2; 1-24
1895-3735
2353-6977
Pojawia się w:
Applied Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
A novel Parkinsons disease detection algorithm combined EMD, BFCC, and SVM classifier
Autorzy:
Boualoulou, Nouhaila
Mounia, Miyara
Nsiri, Benayad
Behoussine Drissi, Taoufiq
Powiązania:
https://bibliotekanauki.pl/articles/27313826.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:
EMD
BFCC
MFCC
SVM
Parkinson’s disease
sztuczna sieć neuronowa
choroba Parkinsona
Opis:
Identifying and assessing Parkinson's disease in its early stages is critical to effectively monitoring the disease's progression. Methodologies based on machine learning enhanced speech analysis are gaining popularity as the potential of this field is revealed. Acoustic features, in particular, are used in a variety of algorithms for machine learning and could serve as indicators of the general health of subjects' voices. In this research paper, a novel method is introduced for the automated detection of Parkinson's disease through speech signal analysis, a support vector machines classifier (SVM) and an Artificial Neural Network (ANN) are used to evaluate and classify the data based on two acoustic features: Bark Frequency Cepstral Coefficients (BFCC) and Mel Frequency Cepstral Coefficients (MFCC). These features are extracted from the denoised signals using Empirical Mode Decomposition (EMD). The most relevant results obtained for a dataset of 38 participants are by the BFCC coefficients with an accuracy up to 92.10%. These results confirm that EMD-BFCC-SVM method can contribute to the detection of Parkinson's disease.
Źródło:
Diagnostyka; 2023, 24, 4; art. no. 2023404
1641-6414
2449-5220
Pojawia się w:
Diagnostyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
Autorzy:
Kaur, G.
Srivastava, M.
Kumar, A.
Powiązania:
https://bibliotekanauki.pl/articles/958089.pdf
Data publikacji:
2018
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
deep neural networks
genetic algorithm
LPCC
MFCC
PLP
RASTA-PLP
speaker recognition
speech recognition
Opis:
Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
Źródło:
Journal of Telecommunications and Information Technology; 2018, 2; 23-31
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Visualization of stages of determining cepstral factors in speech recognition systems
Autorzy:
Proksa, R.
Powiązania:
https://bibliotekanauki.pl/articles/333103.pdf
Data publikacji:
2009
Wydawca:
Uniwersytet Śląski. Wydział Informatyki i Nauki o Materiałach. Instytut Informatyki. Zakład Systemów Komputerowych
Tematy:
rozpoznawanie mowy
LPCC
MFCC
wyizolowane słowo
sygnały mowy
speech recognition
cepstral coefficients
isolated word
Opis:
The article presents two methods of determination of cepstral parameters commonly applied in digital signal processing, in particular in speech recognition systems. The solutions presented are part of a project aimed at developing applications allowing to control the Windows operating system with voice and the use of MSAA (Microsoft Active Accessibility). The analysed voice signal has been visually presented at each of the crucial stages of developing cepstral coefficients.
Źródło:
Journal of Medical Informatics & Technologies; 2009, 13; 121-128
1642-6037
Pojawia się w:
Journal of Medical Informatics & Technologies
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Voice pathology assessment using x-vectors approach
Autorzy:
Kotarba, Katarzyna
Kotarba, Michał
Powiązania:
https://bibliotekanauki.pl/articles/2146638.pdf
Data publikacji:
2021
Wydawca:
Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:
x-vectors
speaker embeddings
voice pathology
MFCC
GFCC
x wektory
osadzenie głośnika
patologia głosu
Opis:
Voice pathology assessment using sustained vowels has proven to be effective and reliable. However, only a few studies regarding detection of pathological speech based on continuous speech are available. In this study we evaluate the usefulness of various regression models trained on continuous speech recordings from Saarbruecken Voice Database in the detection of voice pathologies. The recordings were used for extraction of speaker embeddings called x-vectors based on mel-frequency cepstral coefficients and gammatone frequency cepstral coefficients. Since the dataset used in this study is imbalanced, various over- and undersampling techniques were applied to the training set to ensure robustness of models’ decision boundaries. The models were trained on both imbalanced and resampled training sets using 5-fold cross-validation. The best results were obtained for Multi Layer Perceptron trained on GFCC-based x-vectors, achieving accuracy of 0.8184, F1-score of 0.8212, and ROC AUC score of 0.8810 for the testing set.
Źródło:
Vibrations in Physical Systems; 2021, 32, 1; art. no. 2021108
0860-6897
Pojawia się w:
Vibrations in Physical Systems
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning
Autorzy:
Usman, Mohammed
Zubair, Mohammed
Ahmad, Zeeshan
Zaidi, Monji
Ijyas, Thafasal
Parayangat, Muneer
Wajid, Mohd
Shiblee, Mohammad
Ali, Syed Jaffar
Powiązania:
https://bibliotekanauki.pl/articles/1953514.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:
heart rate from speech
machine learning
MFCC
regression
classification
speech as a biomedical signal
Opis:
Measurement of vital signs of the human body such as heart rate, blood pressure, body temperature and respiratory rate is an important part of diagnosing medical conditions and these are usually measured using medical equipment. In this paper, we propose to estimate an important vital sign – heart rate from speech signals using machine learning algorithms. Existing literature, observation and experience suggest the existence of a correlation between speech characteristics and physiological, psychological as well as emotional conditions. In this work, we estimate the heart rate of individuals by applying machine learning based regression algorithms to Mel frequency cepstrum coefficients, which represent speech features in the spectral domain as well as the temporal variation of spectral features. The estimated heart rate is compared with actual measurement made using a conventional medical device at the time of recording speech. We obtain estimation accuracy close to 94% between the estimated and actual measured heart rate values. Binary classification of heart rate as ‘normal’ or ‘abnormal’ is also achieved with 100% accuracy. A comparison of machine learning algorithms in terms of heart rate estimation and classification accuracy is also presented. Heart rate measurement using speech has applications in remote monitoring of patients, professional athletes and can facilitate telemedicine.
Źródło:
Archives of Acoustics; 2021, 46, 1; 41-53
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Discrimination between patients with CVDs and healthy people by voiceprint using the MFCC and pitch
Autorzy:
Bourouhou, Abdelhamid
Jilbab, Abdelilah
Cherti, Mohammed
Bourouhou, Zaineb
Nacir, Chafik
Powiązania:
https://bibliotekanauki.pl/articles/2096170.pdf
Data publikacji:
2021
Wydawca:
Polska Akademia Nauk. Polskie Towarzystwo Diagnostyki Technicznej PAN
Tematy:
cardiovascular diseases
speech analysis
voiceprint
MFCC
K-near-neighbor classifier
choroby układu krążenia
analiza mowy
Opis:
Heart diseases cause many deaths around the world every year, and his death rate makes the leader of the killer diseases. But early diagnosis can be helpful to decrease those several deaths and save lives. To ensure good diagnose, people must pass a series of clinical examinations and analyses, which make the diagnostic operation expensive and not accessible for everyone. Speech analysis comes as a strong tool which can resolve the task and give back a new way to discriminate between healthy people and person with cardiovascular diseases. Our latest paper treated this task but using a dysphonia measurement to differentiate between people with cardiovascular disease and the healthy one, and we were able to reach 81.5% in prediction accuracy. This time we choose to change the method to increase the accuracy by extracting the voiceprint using 13 Mel-Frequency Cepstral Coefficients and the pitch, extracted from the people's voices provided from a database which contain 75 subjects (35 has cardiovascular diseases, 40 are healthy), three records of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii….) has been collected from each one. We used the k-near-neighbor classifier to train a model and to classify the test entities. We were able to outperform the previous results, reaching 95.55% of prediction accuracy.
Źródło:
Diagnostyka; 2021, 22, 4; 9-16
1641-6414
2449-5220
Pojawia się w:
Diagnostyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Hybridisation of Mel Frequency Cepstral Coefficient and Higher Order Spectral Features for Musical Instruments Classification
Autorzy:
Bhalke, D. G.
Rama Rao, C. B.
Bormane, D.
Powiązania:
https://bibliotekanauki.pl/articles/176497.pdf
Data publikacji:
2016
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
feature extraction
MFCC
HOS
bispectrum
bicoherence
non-linearity
non-Gaussianity
CPNN
zero crossing rate (ZCR)
Opis:
This paper presents the classification of musical instruments using Mel Frequency Cepstral Coefficients (MFCC) and Higher Order Spectral features. MFCC, cepstral, temporal, spectral, and timbral features have been widely used in the task of musical instrument classification. As music sound signal is generated using non-linear dynamics, non-linearity and non-Gaussianity of the musical instruments are important features which have not been considered in the past. In this paper, hybridisation of MFCC and Higher Order Spectral (HOS) based features have been used in the task of musical instrument classification. HOS-based features have been used to provide instrument specific information such as non-Gaussianity and non-linearity of the musical instruments. The extracted features have been presented to Counter Propagation Neural Network (CPNN) to identify the instruments and their family. For experimentation, isolated sounds of 19 musical instruments have been used from McGill University Master Sample (MUMS) sound database. The proposed features show the significant improvement in the classification accuracy of the system.
Źródło:
Archives of Acoustics; 2016, 41, 3; 427-436
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Hybrid of neural networks and hidden Markov models as a modern approach to speech recognition systems
Hybryda sieci neuronowych i ukrytych modeli Markowa jako nowoczesne podejście do rozpoznawania mowy
Autorzy:
Sokólski, P.
Rutkowski, T.
Powiązania:
https://bibliotekanauki.pl/articles/276753.pdf
Data publikacji:
2013
Wydawca:
Sieć Badawcza Łukasiewicz - Przemysłowy Instytut Automatyki i Pomiarów
Tematy:
sztuczne sieci neuronowe
ukryte modele Markowa
MFCC
sterowanie
artificial neural networks
hidden Markov models
speech recognition
control
Opis:
The aim of this paper is to present a hybrid algorithm that combines the advantages of artificial neural networks and hidden Markov models in speech recognition for control purposes. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov models (HMM). The main part of the paper consists of a description of development and implementation of a hybrid algorithm of speech recognition using NN and HMM and presentation of verification of correctness results.
Celem artykułu jest przedstawienie algorytmów hybrydowych łączących zalety sztucznych sieci neuronowych i ukrytych modeli Markowa w zastosowaniach rozpoznawania mowy dla potrzeb sterowania. W zakres opracowania wchodzi przegląd stosowanych obecnie rozwiązań, opis i analiza implementacji wybranych struktur sieci neuronowych (NN) oraz ukrytych modeli Markowa (HMM). Główną część artykułu stanowi opis opracowywania hybrydowego algorytmu rozpoznawania mowy wykorzystującego NN i HMM oraz prezentacja wyników weryfikacji poprawności działania.
Źródło:
Pomiary Automatyka Robotyka; 2013, 17, 2; 449-455
1427-9126
Pojawia się w:
Pomiary Automatyka Robotyka
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Comparison of the efficiency of time and frequency domain descriptors for the classification of selected wind instruments
Porównanie skuteczności deskryptorów wdziedzinie czasu i częstotliwości do klasyfikacji wybranych instrumentów dętych
Autorzy:
Tyburek, Krzysztof
Namli, Ömer Bora
Powiązania:
https://bibliotekanauki.pl/articles/41205950.pdf
Data publikacji:
2022
Wydawca:
Uniwersytet Kazimierza Wielkiego w Bydgoszczy
Tematy:
power spectrum
MFCC
timbre
Music Instrument Identification
MPEG-7
aerophones
widmo mocy
barwa
identyfikacja instrumentów muzycznych
aerofony
Opis:
By analyzing the physical features of the time domain and the frequency domainof the audio signal, it is possible to determine its source and use appropriate algorithms to automatically classify of it. The issue of sound indexing deals with the analysis ofdifferent classes and sources -including signals from musical instruments. By calculating the values of descriptors and classifying them, we obtain information about the type of instrument and its structure -most often the material from which it was made. During the conducted research, it turned out that a different composition of the feature vector is implemented to describe brass instruments and a different one for wooden instruments. In this case, the key feature may be harmonic highs in the frequency domain. The conducted experiments concern an attempt to parameterize wind instruments (aerophones) in order to compare the classification effectiveness of time and spectral descriptors. Sounds from a tube, a flute and a soprano saxophone were used for research. The sample population for each instrument was 21.
Analizując fizyczne cechy domeny czasu i domeny częstotliwości sygnału audio można okreslić jego źródło i przy pomocy własciwych algorytmów dokonac jego automatycznej klasyfikacji. Kwestia indeksacji dźwięku dotyczy analizy różnych klas i źródeł –także sygnałów wywodzących się z instrumentów muzycznych. Obliczając wartości deskryptorów i dokonując ich klasyfikacji uzyskujemy informację o typie instrumentu oraz jego budowie -najczęściej materiału, z którego zostal wykonany. Podczas prowadzonych badań okazało się, że różna kompozycja wektora cech jest implementowana do opisu instrumentów blaszanych oraz inna dla instrumentów drewnianych. W tym przypadku cechą kluczową mogą być składowe wyże harmoniczne w postaci częstotliwościowej dźwieku. Przeprowadzone eksperymenty dotyczą próby parametryzacji instrumentów dętych (aerofonów) w celu porównania skuteczności klasyfikacyjnej deskryptorów czasowych i widmowych. Do badań przeznaczono dźwieki pochodzace z tuby, fletu oraz saksofonu sopranowego. Populacja próbek dla każdego instrumentu wynosiła 21.
Źródło:
Studia i Materiały Informatyki Stosowanej; 2022, 14, 3; 13-19
1689-6300
Pojawia się w:
Studia i Materiały Informatyki Stosowanej
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Effect of Time-domain Windowing on Isolated Speech Recognition System Performance
Autorzy:
Ananthakrishna, Thalengala
Anitha, H.
Girisha, T.
Powiązania:
https://bibliotekanauki.pl/articles/2055228.pdf
Data publikacji:
2022
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
hidden Markov model
HMM
isolated speech recognition system
ISR
Kannada language
mono-phone model
Mel frequency cepstral coefficients
MFCC
Opis:
Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.
Źródło:
International Journal of Electronics and Telecommunications; 2022, 68, 1; 161--166
2300-1933
Pojawia się w:
International Journal of Electronics and Telecommunications
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies