- Tytuł:
- Audio-Visual Speech Processing System for Polish Applicable to Human-Computer Interaction
- Autorzy:
- Jadczyk, T.
- Powiązania:
- https://bibliotekanauki.pl/articles/305828.pdf
- Data publikacji:
- 2018
- Wydawca:
- Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
- Tematy:
-
audio-visual speech recognition
visual features extraction
human-computer interaction - Opis:
- This paper describes audio-visual speech recognition system for Polish language and a set of performance tests under various acoustic conditions. We first present the overall structure of AVASR systems with three main areas: audio features extraction, visual features extraction and subsequently, audiovisual speech integration. We present MFCC features for audio stream with standard HMM modeling technique, then we describe appearance and shape based visual features. Subsequently we present two feature integration techniques, feature concatenation and model fusion. We also discuss the results of a set of experiments conducted to select best system setup for Polish, under noisy audio conditions. Experiments are simulating human-computer interaction in computer control case with voice commands in difficult audio environments. With Active Appearance Model (AAM) and multistream Hidden Markov Model (HMM) we can improve system accuracy by reducing Word Error Rate for more than 30%, comparing to audio-only speech recognition, when Signal-to-Noise Ratio goes down to 0dB.
- Źródło:
-
Computer Science; 2018, 19 (1); 41-63
1508-2806
2300-7036 - Pojawia się w:
- Computer Science
- Dostawca treści:
- Biblioteka Nauki