Digital Library[ Search Result ]
Efficiently Lightweight Korean Language Model with Post-layer Pruning and Multi-stage Fine-tuning
http://doi.org/10.5626/JOK.2025.52.3.260
The increasing size of large-scale language models has led to the need for lightweighting for practical applications. This study presents a method to reduce the existing 8B model to 5B by late-layer pruning, while maintaining and improving its performance through two phases of fine-tuning. In the broad fine-tuning phase, we expanded the model's ability to understand and generate Korean by utilizing English-Korean parallel data and a large Korean corpus, and in the refined fine-tuning phase, we enhanced its expressive and inferential capabilities with high-quality datasets. In addition, we integrated the strengths of individual models through model merging techniques. In the LogicKor leaderboard evaluation, the proposed model performed well in the areas of reasoning, writing, and comprehension, with an overall score of 4.36, outperforming the original Llama-3.1-8B-Instruct model (4.35). This demonstrates a 37.5% reduction in model size while still improving performance.
SyllaBERT: A Syllable-Based Efficient Robust Transformer Model for Real-World Noise and Typographical Errors
Seongwan Park, Yumin Heo, Youngjoong Ko
http://doi.org/10.5626/JOK.2025.52.3.250
Training a Korean language model necessitates the development of a tokenizer specifically designed for the unique features of the Korean language, making this a crucial step in the modeling process. Most current language models utilize morpheme-based or subword-based tokenization. While these approaches work well with clean Korean text data, they are prone to out-of-vocabulary (OOV) issues due to abbreviations and neologisms frequently encountered in real-world Korean data. Moreover, actual Korean text often contains various typos and non-standard expressions, to which traditional morpheme-based or subword-based tokenizers are not sufficiently robust. To tackle these challenges, this paper introduces the SyllaBERT model, which employs syllable-level tokenization to effectively address the specific characteristics of Korean, even in noisy and non-standard contexts, with minimal resources. A compact syllable-level vocabulary was created, and a syllable-based language model was developed by reducing the embedding and hidden layer sizes of existing models. Experimental results show that, despite having approximately four times fewer parameters than subword-based models, the SyllaBERT model outperforms them in natural language understanding tasks on real-world conversational Korean data that includes noise.
Automatic Convolution Neural Network Model Compression Framework for Resource-Constrained Embedded Systems
Jonghun Jeong, Dasom Lee, Hyeonseok Jung, Hoeseok Yang
http://doi.org/10.5626/JOK.2020.47.2.136
Recently, attempts have been made to directly execute various convolutional neural network applications in resource-constrained embedded systems such as IoT. However, since embedded systems have limited computational capability and memory, the size of the neural network model that can be executed is restricted and may not satisfy real-time constraints. Therefore, in this paper, we propose a framework that automatically compresses a given neural network model to satisfy memory and execution time requirements and automatically generates code that can be executed on the target embedded system. Using the proposed framework, we demonstrate that the given neural network models can be automatically optimized for two STM32 Nucleo series boards with different HW specifications for various execution time and memory requirements.
A Simplified Test Maturity Model (sTMM) for Small and Midsize Test Organization
Bo Kyung Park, Woo Sung Jang, Ki Du Kim, R. Young Chul Kim
http://doi.org/10.5626/JOK.2018.45.6.522
Software development and management system has been needed to systematically. Domestic companies in Korea want to improve their software quality with software certifications such as capability maturity model integration (CMMI) and test maturity model integration (TMMi). But current certification models must perform many activities on their process for software organizations. Even test organization also takes a lot of time, manpower and cost to prepare TMMi. For this reason, there is increasing a demand to make a slim certification model that reflects our domestic software industry environment. TTA in 2015/2016 asks us to develop a new refined model for a slim test organization of Korea’s software industry environment. In this paper, we suggest a light-weighted TMM for a slim test organization based on the original TMM. With this model, TTA can provide a guideline for improving the test maturity level through assessing two domestic test organizations. As a result, we expect to improve software quality with this model focused on a test organization.
Analysis of Research Trend and Performance Comparison on Message Authentication Code
Cryptographic technologies providing confidentiality and integrity such as encryption algorithms and message authentication codes (MACs) are necessary for preventing security threats in the Internet of Things (IoT) where various kinds of devices are interconnected. As a number of encryption schemes that have passed security verification are not necessarily suitable for low-power and low-performance IoT devices, various lightweight cryptographic schemes have been proposed. However, a study of lightweight MACs is not sufficient in comparison to that of lightweight block ciphers. Therefore, in this paper, we reviewed various kinds of MACs for their classification and analysis and then, we presented a new way for future MAC development. We also implemented major MAC algorithms and performed experiments to investigate their performance degradation on low-end micro-controllers.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr