Informacja

Drogi użytkowniku, aplikacja do prawidłowego działania wymaga obsługi JavaScript. Proszę włącz obsługę JavaScript w Twojej przeglądarce.

Wyszukujesz frazę "automatic recognition" wg kryterium: Temat


Tytuł:
Hybrid CNN-Ligru acoustic modeling using sincnet raw waveform for hindi ASR
Autorzy:
Kumar, Ankit
Aggarwal, Rajesh Kumar
Powiązania:
https://bibliotekanauki.pl/articles/1839250.pdf
Data publikacji:
2020
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
automatic speech recognition
CNN
CNN-LiGRU
DNN
Opis:
Deep neural networks (DNN) currently play a most vital role in automatic speech recognition (ASR). The convolution neural network (CNN) and recurrent neural network (RNN) are advanced versions of DNN. They are right to deal with the spatial and temporal properties of a speech signal, and both properties have a higher impact on accuracy. With its raw speech signal, CNN shows its superiority over precomputed acoustic features. Recently, a novel first convolution layer named SincNet was proposed to increase interpretability and system performance. In this work, we propose to combine SincNet-CNN with a light-gated recurrent unit (LiGRU) to help reduce the computational load and increase interpretability with a high accuracy. Different configurations of the hybrid model are extensively examined to achieve this goal. All of the experiments were conducted using the Kaldi and Pytorch-Kaldi toolkit with the Hindi speech dataset. The proposed model reports an 8.0% word error rate (WER).
Źródło:
Computer Science; 2020, 21 (4); 397-417
1508-2806
2300-7036
Pojawia się w:
Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Recognition of the numbers in the Polish language
Autorzy:
Plichta, A.
Gąciarz, T.
Krzywdziński, T.
Powiązania:
https://bibliotekanauki.pl/articles/308844.pdf
Data publikacji:
2013
Wydawca:
Instytut Łączności - Państwowy Instytut Badawczy
Tematy:
Automatic Speech Recognition
compressed sensing
Sparse Classification
Opis:
Automatic Speech Recognition is one of the hottest research and application problems in today’s ICT technologies. Huge progress in the development of the intelligent mobile systems needs an implementation of the new services, where users can communicate with devices by sending audio commands. Those systems must be additionally integrated with the highly distributed infrastructures such as computational and mobile clouds, Wireless Sensor Networks (WSNs), and many others. This paper presents the recent research results for the recognition of the separate words and words in short contexts (limited to the numbers) articulated in the Polish language. Compressed Sensing Theory (CST) is applied for the first time as a methodology of speech recognition. The effectiveness of the proposed methodology is justified in numerical tests for both separate words and short sentences.
Źródło:
Journal of Telecommunications and Information Technology; 2013, 4; 70-78
1509-4553
1899-8852
Pojawia się w:
Journal of Telecommunications and Information Technology
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Detection of fillers in the speech by people who stutter
Autorzy:
Suszyński, Waldemar
Charytanowicz, Małgorzata
Rosa, Wojciech
Koczan, Leopold
Stęgierski, Rafał
Powiązania:
https://bibliotekanauki.pl/articles/1956029.pdf
Data publikacji:
2021
Wydawca:
Polskie Towarzystwo Promocji Wiedzy
Tematy:
stuttering
fillers disfluency
automatic recognition
fillers detection
jąkanie
dysfluencja
automatyczne rozpoznawanie
wykrywanie
Opis:
Stuttering is a speech impediment that is a very complex disorder. It is difficult to diagnose and treat, and is of unknown initiation, despite the large number of studies in this field. Stuttering can take many forms and varies from person to person, and it can change under the influence of external factors. Diagnosing and treating speech disorders such as stuttering requires from a speech therapist, not only good professional prepa-ration, but also experience gained through research and practice in the field. The use of acoustic methods in combination with elements of artificial intelligence makes it possible to objectively assess the disorder, as well as to control the effects of treatment. The main aim of the study was to present an algorithm for automatic recognition of fillers disfluency in the statements of people who stutter. This is done on the basis of their parameterized features in the amplitude-frequency space. The work provides as well, exemplary results demonstrating their possibility and effectiveness. In order to verify and optimize the procedures, the statements of seven stutterers with duration of 2 to 4 minutes were selected. Over 70% efficiency and predictability of automatic detection of these disfluencies was achieved. The use of an automatic method in conjunction with therapy for a stuttering person can give us the opportunity to objectively assess the disorder, as well as to evaluate the progress of therapy.
Źródło:
Applied Computer Science; 2021, 17, 4; 45-54
1895-3735
Pojawia się w:
Applied Computer Science
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Using SVM Classifier and Micro-Doppler Signature for Automatic Recognition of Sonar Targets
Autorzy:
Saffari, Abbas
Zahiri, Seye Hamid
Khozein Ghanad, Navid
Powiązania:
https://bibliotekanauki.pl/articles/31339922.pdf
Data publikacji:
2023
Wydawca:
Polska Akademia Nauk. Czasopisma i Monografie PAN
Tematy:
sonar micro-Doppler
automatic recognition
SVM
RBF kernel
linear kernel
polynomial kernel
Opis:
In this paper, we propose using a propeller modulation on the transmitted signal (called sonar micro-Doppler) and different support vector machine (SVM) kernels for automatic recognition of moving sonar targets. In general, the main challenge for researchers and craftsmen working in the field of sonar target recognition is the lack of access to a valid and comprehensive database. Therefore, using a comprehensive mathematical model to simulate the signal received from the target can respond to this challenge. The mathematical model used in this paper simulates the return signal of moving sonar targets well. The resulting signals have unique properties and are known as frequency signatures. However, to reduce the complexity of the model, the 128-point fast Fourier transform (FFT) is used. The selected SVM classification is the most popular machine learning algorithm with three main kernel functions: RBF kernel, linear kernel, and polynomial kernel tested. The accuracy of correctly recognizing targets for different signal-to-noise ratios (SNR) and different viewing angles was assessed. Accuracy detection of targets for different SNRs (−20, −15, −10, −5, 0, 5, 10, 15, 20) and different viewing angles (10, 20, 30, 40, 50, 60, 70, 80) is evaluated. For a more fair comparison, multilayer perceptron neural network with two back-propagation (MLP-BP) training methods and gray wolf optimization (MLP-GWO) algorithm were used. But unfortunately, considering the number of classes, its performance was not satisfactory. The results showed that the RBF kernel is more capable for high SNRs (SNR = 20, viewing angle = 10) with an accuracy of 98.528%.
Źródło:
Archives of Acoustics; 2023, 48, 1; 49-61
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Two-Microphone Dereverberation for Automatic Speech Recognition of Polish
Autorzy:
Kundegorski, M.
Jackson, P. J. B.
Ziółko, B.
Powiązania:
https://bibliotekanauki.pl/articles/176431.pdf
Data publikacji:
2014
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
speech enhancement
reverberation
automatic speech recognition
ASR
Polish
Opis:
Reverberation is a common problem for many speech technologies, such as automatic speech recogni- tion (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical con- ditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.
Źródło:
Archives of Acoustics; 2014, 39, 3; 411-420
0137-5075
Pojawia się w:
Archives of Acoustics
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Comparison of the effectiveness of automatic targeting, using systems of ATR type, with manual targeting, based on full test procedure ISO 17123-3
Porównanie efektywności celowania automatycznego typu ATR oraz manualnego na podstawie pełnej procedury testowej ISO 17123-3
Autorzy:
Owerko, T.
Kuras, P.
Szafarczyk, A.
Powiązania:
https://bibliotekanauki.pl/articles/386080.pdf
Data publikacji:
2010
Wydawca:
Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
Tematy:
automatyczne rozpoznanie celu
ISO
dokładność
automatic target recognition
accuracy
Opis:
The subject of the article is to compare the accuracy of targeting implemented in two manners – manual and automatic. The study was conducted based on international standard ISO 17123-3 for two theodolites equipped with the automatic target recognition system. The paper discusses surveying procedure – a full testing procedure, which allows to determine the actual accuracy of the tested instruments. The obtained values are the basis for conducting statistical tests that allow to verify the presented hypothesis.
Tematem artykułu jest porównanie dokładności celowania wykonywanego dwoma sposobami – manualnym i automatycznym. Badania przeprowadzono na podstawie międzynarodowej normy ISO 17123-3 dla dwóch teodolitów wyposażonych w system automatycznego rozpoznawania celu. W pracy omówiono procedurę pomiarową – pełną procedurę testową, która pozwala wyznaczać rzeczywiste dokładności badanych instrumentów. Uzyskane wartości stanowią podstawę do przeprowadzenia testów statystycznych, które pozwalają na zweryfikowanie stawianej hipotezy.
Źródło:
Geomatics and Environmental Engineering; 2010, 4, 1/1; 107-113
1898-1135
Pojawia się w:
Geomatics and Environmental Engineering
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Parametry identyfikacyjne umożliwiające automatyczne rozpoznawanie cyfr wypowiadanych w języku polskim
Identification parameters enabling automatic recognition of digits spoken in Polish
Autorzy:
Dulas, J.
Powiązania:
https://bibliotekanauki.pl/articles/157420.pdf
Data publikacji:
2011
Wydawca:
Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:
automatyczne rozpoznawanie sygnału mowy
fonemy
automatic speech recognition
phonemes
Opis:
Artykuł przedstawia najnowsze wyniki prac autora w dziedzinie automatycznego rozpoznawania sygnałów mowy. Wyniki badań prowadzonych na zbiorze 500 nagrań cyfr wypowiadanych w języku polskim przez 50 mówców różnej płci i w różnym wieku pozwalają na zaproponowanie zestawu parametrów niezbędnych do przeprowadzenia procesu ich identyfikacji. Jak pokazano w artykule zestaw kilku podstawowych cech identyfikujących jest wystarczający aby taki proces przeprowadzić. Zaproponowany zestaw parametrów jest łatwy do uzyskania przy niewielkiej mocy obliczeniowej.
The paper describes a new author's method for automatic recognition of digits spoken in Polish. In this new approach there are no frequency analyses as used to be made in such systems but the image recognition of the time characteristic is applied. Investigations performed on 500 records of people of different sex and age showed that there was possibility of constructing an automatic recognition system based on a few parameters. The first is the number of voiced phonemes included in a recognized word (Tab. 1). In this group there are all wavelets and some consonants. They include basic periods inside their time characteristics. This parameter is obtained using the grid method designed by the author (Fig. 3). The second one is the number and position of noisy phonemes. To this group there belong phonemes without basic periods but with big signal variety. This parameter is calculated using the number of local extrema, the signal amplitude level and checking if there are no basic periods. The third parameter is the shape of a signal envelope (Tab. 2). As investigations showed, it is possible to find the envelope pattern for each Polish digit common for all tested speakers. It was proved that these parameters are sufficient for automatic speech recognition of digits spoken in Polish. This new method can also be applied to other systems with small number of recognized words. It is fast and lack of frequency analyses causes that it has low hardware demands.
Źródło:
Pomiary Automatyka Kontrola; 2011, R. 57, nr 3, 3; 308-311
0032-4140
Pojawia się w:
Pomiary Automatyka Kontrola
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition
Autorzy:
Fujimoto, K.
Hamada, N.
Kasprzak, W.
Powiązania:
https://bibliotekanauki.pl/articles/201105.pdf
Data publikacji:
2012
Wydawca:
Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:
automatic speech recognition
spectrogram analysis
particle filter
pitch estimation
Opis:
A stable and accurate estimation of the fundamental frequency (pitch, F0) is an important requirement in speech and music signal analysis, in tasks like automatic speech recognition and extraction of target signal in noisy environment. In this paper, we propose a pitch-related spectrogram normalization scheme to improve the speaker – independency of standard speech features. A very accurate estimation of the fundamental frequency is a must. Hence, we develop a non-parametric recursive estimation method of F0 and its 2nd and 3d harmonic frequencies in noisy circumstances. The proposed method is different from typical Kalman and particle filter methods in the way that no particular sum of sinusoidal model is used. Also we tend to estimate F0 and its lower harmonics by using novel likelihood function. Through experiments under various noise levels, the proposed method is proved to be more accurate than other conventional methods. The spectrogram normalization scheme makes a mapping of real harmonic structure to a normalized structure. Results obtained for voiced phonemes show an increase in stability of the standard speech features – the average within-phoneme distance of the MFCC features for voiced phonemes can be decreased by several percent.
Źródło:
Bulletin of the Polish Academy of Sciences. Technical Sciences; 2012, 60, 1; 71-81
0239-7528
Pojawia się w:
Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Recognition of speaker’s age group and gender for a large database of telephone-recorded voices
Autorzy:
Staroniewicz, Piotr
Powiązania:
https://bibliotekanauki.pl/articles/2202432.pdf
Data publikacji:
2022
Wydawca:
Politechnika Poznańska. Instytut Mechaniki Stosowanej
Tematy:
speech processing
automatic age recognition
przetwarzanie mowy
automatyczne rozpoznawanie wieku
Opis:
The paper presents the results of the automatic recognition of age group and gender of speakers performed for the large SpeechDAT(E) acoustic database for the Polish language, containing recordings of 1000 speakers (486 males/514 females) aged 12 to 73, recorded in telephone conditions. Three age groups were recognised for each gender. Mel Frequency Cepstral Coefficients (MFCC) were used to describe the recognized signals parametrically. Among the classification methods tested in this study, the best results were obtained for the SVM (Support Vector Machines) method.
Źródło:
Vibrations in Physical Systems; 2022, 33, 2; art. no. 2022203
0860-6897
Pojawia się w:
Vibrations in Physical Systems
Dostawca treści:
Biblioteka Nauki
Artykuł
Tytuł:
Analiza obwiedni jako parametr wspomagający automatyczną identyfikację wyrażeń
The envelope analysis as a useful parameter in automatic phrase identification
Autorzy:
Dulas, J.
Powiązania:
https://bibliotekanauki.pl/articles/156853.pdf
Data publikacji:
2009
Wydawca:
Stowarzyszenie Inżynierów i Techników Mechaników Polskich
Tematy:
automatyczne rozpoznawanie sygnałów mowy
analiza obwiedni
automatic speech recognition
envelope analysis
Opis:
W badaniach nad automatycznym rozpoznawaniem sygnałów mowy notuje się stały postęp, choć różnorodność języków utrudnia wprowadzenie jednakowych rozwiązań. Przykładem rozwoju i upowszechnienia metod identyfikacji mowy może być system operacyjny Windows XP, w którym zamieszczono narzędzia do sterowania aplikacjami za pomocą sygnałów głosowych. Brak jednak nadal rozwiązań dla języka polskiego, co sprawia że potrzebne są badania zmierzające do opracowania niezawodnych algorytmów identyfikujących i sterujących. W artykule przedstawiono wyniki badań obwiedni sygnałów mowy, będących cyframi z zakresu 0-9, uzyskanych dla grupy 50-ciu osób różnych płci i w różnym wieku. Celem przeprowadzonych badań było uzyskanie odpowiedzi na pytanie, czy analiza obwiedni może stanowić parametr w procesie automatycznego rozpoznawania sygnałów mowy i czy jest możliwe stworzenie modeli obwiedni dla każdej z cyfr, które byłyby wspólne dla wszystkich (50) mówców.
In scientific research on the speech signal recognition there can be noted great development, although differences between languages make it difficult to work out the same algorithms for all of them. A good example of the big progress in this field can be Windows XP, an operating system which enables controlling some applications by voice (but not in Polish). There is still lack of good working programs controlled by Polish. In this paper the results of investigations on the voice signal envelope are described. There were tested digital recordings, from the range 0 - 9, obtained for 50 persons of different age and sex . The main goal was to find out if the envelope analysis could be helpful in automatic speech recognition. During the investigations basing on the analysis of the digit time characteristic, each digit was divided into parts (from 2 to 5) having the similar envelope. Also the minimum duration and the amplitude range were found for each part. The results are given in Table 1. Table 2 contains the results of fitting the envelope to each digit. It is shown that the envelope patterns are common for all the speakers and digits. Although the envelope analysis is not sufficient alone for automatic speech recognition (some digit patterns fit to the others), it can be used as one of the parameters employed for this purpose.
Źródło:
Pomiary Automatyka Kontrola; 2009, R. 55, nr 5, 5; 308-309
0032-4140
Pojawia się w:
Pomiary Automatyka Kontrola
Dostawca treści:
Biblioteka Nauki
Artykuł

Ta witryna wykorzystuje pliki cookies do przechowywania informacji na Twoim komputerze. Pliki cookies stosujemy w celu świadczenia usług na najwyższym poziomie, w tym w sposób dostosowany do indywidualnych potrzeb. Korzystanie z witryny bez zmiany ustawień dotyczących cookies oznacza, że będą one zamieszczane w Twoim komputerze. W każdym momencie możesz dokonać zmiany ustawień dotyczących cookies