Temat: speech detection - Katalog OPAC zbiorów

Skocz do pozycji: 1.

Tytuł:: Rule Based Speech Signal Segmentation
Autorzy:: Greibus, M.
Telksnys, L.
Powiązania:: https://bibliotekanauki.pl/articles/308535.pdf
Data publikacji:: 2010
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Tematy:: rule base
speech analysis
speech endpoint detection
speech segmentation
Opis:: This paper presents the automated speech signal segmentation problem. Segmentation algorithms based on energetic threshold showed good results only in noise-free environments. With higher noise level automatic threshold calculation becomes complicated task. Rule based postprocessing of segments can give more stable results. Off-line, on-line and extrema types of rules are reviewed. An extrema-type segmentation algorithm is proposed. This algorithm is enhanced by a rule base to extract higher energy level segments from noise. This algorithm can work well with energy like features. The experiments were made to compare threshold and rule-based segmentation in different noise types. Also was tested if multifeature segmentation can improve segmentation results. The extrema rule-based segmentation showed smaller error ratio in different noise types and levels. Proposed algorithm does not require high calculation resources. Such algorithm can be processed by devices with limited computing power.
Źródło:: Journal of Telecommunications and Information Technology; 2010, 4; 37-43
1509-4553
1899-8852
Pojawia się w:: Journal of Telecommunications and Information Technology
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 2.

Tytuł:: Investigation of the Lombard effect based on a machine learning approach
Autorzy:: Korvel, Gražina
Treigys, Povilas
Kąkol, Krzysztof
Kostek, Bożena
Powiązania:: https://bibliotekanauki.pl/articles/24200693.pdf
Data publikacji:: 2023
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: Lombard effect
speech detection
noise signal
self similarity matrix
convolutional neural network
efekt Lombarda
wykrywanie mowy
sygnał szumowy
sieć neuronowa konwolucyjna
Opis:: The Lombard effect is an involuntary increase in the speaker’s pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters related to speech changes produced by the Lombard effect are extracted. Mid-term statistics are built upon the parameters and used for the self-similarity matrix construction. They constitute input data for a convolutional neural network (CNN). The self-similarity-based approach is then compared with two other methods, i.e., spectrograms used as input to the CNN and speech acoustic parameters combined with the k-nearest neighbors algorithm. The experimental investigations show the superiority of the self-similarity approach applied to Lombard effect detection over the other two methods utilized. Moreover, small standard deviation values for the self-similarity approach prove the resulting high accuracies.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2023, 33, 3; 479--492
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 3.

Tytuł:: Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset
Autorzy:: Gosztolya, G.
Beke, A.
Neuberger, T.
Tóth, L.
Powiązania:: https://bibliotekanauki.pl/articles/177910.pdf
Data publikacji:: 2016
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech recognition
speech technology
computational paralinguistics
laughter detection
deep neural networks
Opis:: Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.
Źródło:: Archives of Acoustics; 2016, 41, 4; 669-682
0137-5075
Pojawia się w:: Archives of Acoustics
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 4.

Tytuł:: Recognition of Human Emotion from a Speech Signal Based on Plutchiks Model
Autorzy:: Kamińska, D.
Pelikant, A.
Powiązania:: https://bibliotekanauki.pl/articles/227272.pdf
Data publikacji:: 2012
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: emotion detection
Plutchik's wheel of emotion
speech signal
Opis:: Machine recognition of human emotional states is an essential part in improving man-machine interaction. During expressive speech the voice conveys semantic message as well as the information about emotional state of the speaker. The pitch contour is one of the most significant properties of speech, which is affected by the emotional state. Therefore pitch features have been commonly used in systems for automatic emotion detection. In this work different intensities of emotions and their influence on pitch features have been studied. This understanding is important to develop such a system. Intensities of emotions are presented on Plutchik's cone-shaped 3D model. The k Nearest Neighbor algorithm has been used for classification. The classification has been divided into two parts. First, the primary emotion has been detected, then its intensity has been specified. The results show that the recognition accuracy of the system is over 50% for primary emotions, and over 70% for its intensities.
Źródło:: International Journal of Electronics and Telecommunications; 2012, 58, 2; 165-170
2300-1933
Pojawia się w:: International Journal of Electronics and Telecommunications
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 5.

Tytuł:: Detection of Non-native Speaker Status from Backwards and Vocoded Content-masked Speech
Erkennung eines fremden Akzents in der vokodierten und rückwärts gerichteten Sprache
Autorzy:: Rojczyk, Arkadiusz
Porzuczek, Andrzej
Powiązania:: https://bibliotekanauki.pl/articles/1182407.pdf
Data publikacji:: 2021-01-18
Wydawca:: Wydawnictwo Uniwersytetu Śląskiego
Tematy:: accent detection
non-native accent
content-masked speech
vocoded speech
backwards speech
akzenterkennung
nicht-muttersprachlicher akzent
rückwärts gerichtete
sprache
vokodierte sprache
Opis:: This paper addresses the issue of speech rhythm as a cue to non-native pronunciation. In natural recordings, it is impossible to disentangle rhythm from segmental, subphonemic or suprasegmental features that may influence nativeness ratings. However, two methods of speech manipulation, that is, backwards content-masked speech and vocoded speech, allow the identification of native and non-native speech in which segmental properties are masked and become inaccessible to the listeners. In the current study, we use these two methods to compare the perception of content-masked native English speech and Polish-accented speech. Both native English and Polish-accented recordings were manipulated using backwards masked speech and 4-band white-noise vocoded speech. Fourteen listeners classified the stimuli as produced by native or Polish speakers of English. Polish and English differ in their temporal organization, so, if rhythm is a significant contributor to the status of non-native accentedness, we expected an above-chance rate of recognition of native and non-native English speech. Moreover, backwards content-masked speech was predicted to yield better results than vocoded speech, because it retains some of the indexical properties of speakers. The resultsshow that listeners are unable to detect non-native accent in Polish learners of English from backwards and vocoded speech samples.
Die Studie befasst sich mit der Erkennung eines fremden Akzents in der englischen vokodierten und rückwärts gerichteten Sprache. Beide Verarbeitungsverfahren eliminieren eine semantische Information und teilweise (rückwärts gerichtete Sprache) oder vollständig (vokodierte Sprache) eine Spektralinformation, während die rhythmischen Merkmale der Sprache beibehalten werden, die als Differenzierungsgrad der Dauer von prosodischen Einheiten verstanden werden, die zur Unterscheidung von Proben des einheimischen und fremden Akzents dienen könnten. An der Untersuchung nahmen englische Muttersprachler und Polen teil, die diese Sprache auf fortgeschrittenem Niveau gebrauchen. Die Ergebnisse zeigten, dass weder Engländer noch Polen in der Lage sind, einen fremden Akzent in den verarbeiteten Sprachproben nur aufgrund der zeitlichen Verteilung der Akzente (vokodierte Sprache) und des Differenzierungsgrades der Länge von prosodischen Einheiten (rückwärts gerichtete Sprache) zu erkennen.
Źródło:: Theory and Practice of Second Language Acquisition; 2020, 6, 2; 87-105
2450-5455
2451-2125
Pojawia się w:: Theory and Practice of Second Language Acquisition
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 6.

Tytuł:: A Morpho-syntactic Analysis of Human-moderated Hate Speech Samples from Wykop.pl Web Service
Morfosyntaktyczna analiza przykładów mowy nienawiści zablokowanych przez moderatorów serwisu Wykop.pl
Autorzy:: Okulska, Inez
Kołos, Anna
Powiązania:: https://bibliotekanauki.pl/articles/28407654.pdf
Data publikacji:: 2023
Wydawca:: Krakowskie Towarzystwo TERTIUM
Tematy:: cyberagresja
mowa nienawiści
treści internetowe
automatyczne wykrywanie treści
stylometria
cyberbullying
hate speech
user-generated online content
automated detection
stylometry
Opis:: The dynamic increase in user-generated content on the web presents significant challenges in protecting Internet users from exposure to offensive material, such as cyberbullying and hate speech, while also minimizing the spread of wrongful conduct. However, designing automated detection models for such offensive content remains complex, particularly in languages with limited publicly available data. To address this issue, our research collaborates with the Wykop.pl web service to fine-tune a model using genuine content that has been banned by professional moderators. In this paper, we focus on the Polish language and discuss the notion of datasets and annotation frameworks, presenting our stylometric analysis of Wykop.pl content to identify morpho-syntactic structures that are commonly applied in cyberbullying and hate speech. By doing so, we contribute to the ongoing discussion on offensive language and hate speech in sociolinguistic studies, emphasizing the need to consider user-generated online content.
Dynamiczny wzrost treści generowanych przez użytkowników w sieci stanowi poważne wyzwanie w zakresie ochrony użytkowników Internetu przed narażeniem na obraźliwe materiały, takie jak cyberprzemoc i mowa nienawiści, i jednoczesnego ograniczania rozprzestrzeniania nieetycznych zachowań. Jednak projektowanie zautomatyzowanych modeli wykrywania obraźliwych treści pozostaje złożonym zadaniem, szczególnie w językach o ograniczonych publicznie dostępnych danych. W naszych badaniach współpracujemy z serwisem internetowym Wykop.pl w celu uczenia modelu przy użyciu rzeczywistych treści, które podlegały usunięciu w procesie moderacji. W niniejszym artykule skupiamy się na języku polskim i omawiamy pojęcie zbiorów danych i metod anotacji, a następnie przedstawiamy naszą analizę stylometryczną treści z serwisu Wykop.pl w celu zidentyfikowania struktur morfosyntaktycznych, które są powszechnie aplikowane w języku cyberprzemocy i mowie nienawiści. Dzięki naszym badaniom mamy nadzieję na wniesienie wkładu w toczącą się dyskusję na temat obraźliwego języka i mowy nienawiści w badaniach socjolingwistycznych, podkreślając potrzebę analizy treści generowanych przez użytkowników w sieci.
Źródło:: Półrocznik Językoznawczy Tertium; 2023, 8, 2; 54-71
2543-7844
Pojawia się w:: Półrocznik Językoznawczy Tertium
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 7.

Tytuł:: Automatic speech signal segmentation based on the innovation adaptive filter
Autorzy:: Makowski, R.
Hossa, R.
Powiązania:: https://bibliotekanauki.pl/articles/330096.pdf
Data publikacji:: 2014
Wydawca:: Uniwersytet Zielonogórski. Oficyna Wydawnicza
Tematy:: automatic speech segmentation
inter phoneme boundaries
Schur adaptive filtering
detection threshold determination
automatyczna segmentacja mowy
filtracja adaptacyjna
określenie progu detekcji
Opis:: Speech segmentation is an essential stage in designing automatic speech recognition systems and one can find several algorithms proposed in the literature. It is a difficult problem, as speech is immensely variable. The aim of the authors’ studies was to design an algorithm that could be employed at the stage of automatic speech recognition. This would make it possible to avoid some problems related to speech signal parametrization. Posing the problem in such a way requires the algorithm to be capable of working in real time. The only such algorithm was proposed by Tyagi et al., (2006), and it is a modified version of Brandt’s algorithm. The article presents a new algorithm for unsupervised automatic speech signal segmentation. It performs segmentation without access to information about the phonetic content of the utterances, relying exclusively on second-order statistics of a speech signal. The starting point for the proposed method is time-varying Schur coefficients of an innovation adaptive filter. The Schur algorithm is known to be fast, precise, stable and capable of rapidly tracking changes in second order signal statistics. A transfer from one phoneme to another in the speech signal always indicates a change in signal statistics caused by vocal track changes. In order to allow for the properties of human hearing, detection of inter-phoneme boundaries is performed based on statistics defined on the mel spectrum determined from the reflection coefficients. The paper presents the structure of the algorithm, defines its properties, lists parameter values, describes detection efficiency results, and compares them with those for another algorithm. The obtained segmentation results, are satisfactory.
Źródło:: International Journal of Applied Mathematics and Computer Science; 2014, 24, 2; 259-270
1641-876X
2083-8492
Pojawia się w:: International Journal of Applied Mathematics and Computer Science
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 8.

Tytuł:: Speech nonfluency detection and classification based on linear prediction coefficients and neural networks
Autorzy:: Kobus, A.
Kuniszyk-Jóźkowiak, W.
Smołka, E.
Codello, I.
Powiązania:: https://bibliotekanauki.pl/articles/333600.pdf
Data publikacji:: 2010
Wydawca:: Uniwersytet Śląski. Wydział Informatyki i Nauki o Materiałach. Instytut Informatyki. Zakład Systemów Komputerowych
Tematy:: przewidywanie liniowe
liniowe kodowanie predykcyjne
sieci nuronowe
kowariancja
brak płynności
mowa
wykrywanie
perceptron
linear prediction
LPC
neural networks
Kohonen
covariance
nonfluency
speech
detection
radial
Opis:: The goal of the paper is to present a speech nonfluency detection method based on linear prediction coefficients obtained by using the covariance method. The application “Dabar” was created for research. It implements three different methods of LP with the ability to send coefficients computed by them into the input of Kohonen networks. Neural networks were used to classify utterances in categories of fluent and nonfluent. The first one was Kohonen network (SOM), used to reduce LP coefficients representation of each window, which were used as input data to SOM input layer, to a vector of winning neurons of SOM output layer. Radial Basis Function (RBF) networks, linear networks and Multi-Layer Perceptrons were used as classifiers. The research was based on 55 fluent samples and 54 samples with blockades on plosives (p, b, d, t, k, g). The examination was finished with the outcome of 76% classifying.
Źródło:: Journal of Medical Informatics & Technologies; 2010, 15; 135-143
1642-6037
Pojawia się w:: Journal of Medical Informatics & Technologies
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 9.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2173587.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; art. no. e136300
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Skocz do pozycji: 10.

Tytuł:: Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks
Autorzy:: Meng, Hao
Yan, Tianhao
Wei, Hongwei
Ji, Xun
Powiązania:: https://bibliotekanauki.pl/articles/2090711.pdf
Data publikacji:: 2021
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Tematy:: speech emotion recognition
voice activity detection
wavelet packet reconstruction
feature extraction
LSTM networks
attention mechanism
rozpoznawanie emocji mowy
wykrywanie aktywności głosowej
rekonstrukcja pakietu falkowego
wyodrębnianie cech
mechanizm uwagi
sieć LSTM
Opis:: Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.
Źródło:: Bulletin of the Polish Academy of Sciences. Technical Sciences; 2021, 69, 1; e136300, 1--12
0239-7528
Pojawia się w:: Bulletin of the Polish Academy of Sciences. Technical Sciences
Dostawca treści:: Biblioteka Nauki

Artykuł

Zmień widok

na półce

Informacja

Wyszukujesz frazę "speech detection" wg kryterium: Temat

Źródło danych

Dostawca treści

Kolekcja

Rok wydania

Wydawca

Temat

Autor

Typ dokumentu

Język