Journal of KIISE

Search : [ author: 이지현 ] (3)

An Automated Error Detection Method for Speech Transcription Corpora Based on Speech Recognition and Language Models

Jeongpil Lee, Jeehyun Lee, Yerin Choi, Jaehoo Jang, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2024.51.4.362

This research proposes a "machine-in-the-loop" approach for automatic error detection in Korean speech corpora by integrating the knowledge of CTC-based speech recognition models and language models. We experimentally validated its error detection performance through a three-step procedure that leveraged Character Error Rate (CER) from the speech recognition model and Perplexity (PPL) from the language model to identify potential transcription error candidates and verify their text labels. This research focused on the Korean speech corpus, KsponSpeech, resulting in a reduction of the character error rate on the test set from 9.44% to 8.9%. Notably, this performance enhancement was achieved even when inspecting only approximately 11% of the test data, highlighting the higher efficiency of our proposed method than a comprehensive manual inspection process. Our study affirms the potential of this efficient "machine-in-the-loop" approach for a cost-effective error detection mechanism in speech data while ensuring accuracy.

Design of Extended Real-time Data Pipeline System Architecture

Hoseung Shin, Sungwon Kang, Jihyun Lee

http://doi.org/

Big data systems are widely used to collect large-scale log data, so it is very important for these systems to operate with a high level of performance. However, the current Hadoop-based big data system architecture has a problem in that its performance is low as a result of redundant processing. This paper solves this problem by improving the design of the Hadoop system architecture. The proposed architecture uses the batch-based data collection of the existing architecture in combination with a single processing method. A high level of performance can be achieved by analyzing the collected data directly in memory to avoid redundant processing. The proposed architecture guarantees system expandability, which is an advantage of using the Hadoop architecture. This paper confirms that the proposed architecture is approximately 30% to 35% faster in analyzing and processing data than existing architectures and that it is also extendable.

A Software Architecture Design Method that Matches Problem Frames and Architectural Patterns

Jungmin Kim, Sungwon Kang, Jihyun Lee

http://doi.org/

While architectural patterns provide software development solutions by providing schemas for structural organizations of software systems based on empirical knowledge, Jackson’s problem frames provide a method of analyzing software problems. Problem frames are useful to understanding the software development problem, by putting emphasis on the problem domain, rather than on the solution space. Research exists that relates problem frames and software architecture, but most of this research uses problem frames only to understand given problems. Moreover, none of the existing research derives architectural patterns by considering both problem frames and quality attributes. In this paper, we propose a software architecture design method for pattern-based architecture design, by matching problem frames and architectural patterns. To that end, our approach first develops the problem model based on the problem frames approach, and then uses it to match with candidate architectural patterns, from the perspectives of both functionality, and quality attributes. Functional matching uses the problem frame diagram to match the problem model of an architectural pattern. We conduct a case study to show that our approach can systematically decide the right architectural patterns, and provide a basis for fine-grained software architecture design.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Digital Library[ Search Result ]

An Automated Error Detection Method for Speech Transcription Corpora Based on Speech Recognition and Language Models

Design of Extended Real-time Data Pipeline System Architecture

A Software Architecture Design Method that Matches Problem Frames and Architectural Patterns

Search

Editorial Office