Search : [ author: 이정필 ] (3)

Creating a of Noisy Environment Speech Mixture Dataset for Korean Speech Separation

Jaehoo Jang, Kun Park, Jeongpil Lee, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2024.51.6.513

In the field of speech separation, models are typically trained using datasets that contain mixtures of speech and overlapping noise. Although there are established international datasets for advancing speech separation techniques, Korea currently lacks a similar precedent for constructing datasets with overlapping speech and noise. Therefore, this paper presents a dataset generator specifically designed for single-channel speech separation models tailored to the Korean language. The Korean Speech mixture with Noise dataset is introduced, which has been constructed using this generator. In our experiments, we train and evaluate a Conv-TasNet speech separation model using the newly created dataset. Additionally, we verify the dataset's efficacy by comparing the Character Error Rate (CER) between the separated speech and the original speech using a pre-trained speech recognition model.

An Automated Error Detection Method for Speech Transcription Corpora Based on Speech Recognition and Language Models

Jeongpil Lee, Jeehyun Lee, Yerin Choi, Jaehoo Jang, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2024.51.4.362

This research proposes a "machine-in-the-loop" approach for automatic error detection in Korean speech corpora by integrating the knowledge of CTC-based speech recognition models and language models. We experimentally validated its error detection performance through a three-step procedure that leveraged Character Error Rate (CER) from the speech recognition model and Perplexity (PPL) from the language model to identify potential transcription error candidates and verify their text labels. This research focused on the Korean speech corpus, KsponSpeech, resulting in a reduction of the character error rate on the test set from 9.44% to 8.9%. Notably, this performance enhancement was achieved even when inspecting only approximately 11% of the test data, highlighting the higher efficiency of our proposed method than a comprehensive manual inspection process. Our study affirms the potential of this efficient "machine-in-the-loop" approach for a cost-effective error detection mechanism in speech data while ensuring accuracy.

Number Normalization in Korean Using the Transformer Model

Jaeyoon Chun, Chansong Jo, Jeongpil Lee, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2021.48.5.510

Text normalization is a significant component of text-to-speech (TTS) systems. Since numbers in Korean are read in various ways according to their context, number normalization in Korean is crucial to improving the quality of TTS systems. However, the existing model is based on ad hoc rules that are inappropriate for normalizing non-standard numbers. The purpose of this study was to propose a model of number normalization in Korean based on the sequence-to-sequence Transformer model. Moreover, number positional encoding was added to the model to handle long numbers. Overall, the proposed model achieved 98.80% f1 score in the normal test dataset and 90.1% in the non-standard test dataset, which were 2.52% and 19% higher, respectively, than the baseline model. In addition, the proposed model demonstrated a 13% improvement in the longer-number test dataset compared to the other deep learning models.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr