Search : [ author: Seungsoo Nam ] (1)

A Study on Improving the Accuracy of Korean Speech Recognition Texts Using KcBERT

Donguk Min, Seungsoo Nam, Daeseon Choi

http://doi.org/10.5626/JOK.2024.51.12.1115

In the field of speech recognition, models such as Whisper, Wav2Vec2.0, and Google STT are widely utilized. However, Korean speech recognition faces challenges because complex phonological rules and diverse pronunciation variations hinder performance improvements. To address these issues, this study proposed a method that combined the Whisper model with a post-processing approach using KcBERT. By applying KcBERT’s bidirectional contextual learning to text generated by the Whisper model, the proposed method could enhance contextual coherence and refine the text for greater naturalness. Experimental results showed that post-processing reduced the Character Error Rate (CER) from 5.12% to 1.88% in clean environments and from 22.65% to 10.17% in noisy environments. Furthermore, the Word Error Rate (WER) was significantly improved, decreasing from 13.29% to 2.71% in clean settings and from 38.98% to 11.15% in noisy settings. BERTScore also exhibited overall improvement. These results demonstrate that the proposed approach is effective in addressing complex phonological rules and maintaining text coherence within Korean speech recognition.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr