An Order-Preserving Pattern Matching Algorithm with Three Partitions

Seokchul Kang, Joong Chae Na, Jeong Seop Sim

http://doi.org/10.5626/JOK.2025.52.11.901

Two strings of equal length are considered order-isomorphic if they have identical relative orders at every position. The order-preserving pattern matching problem seeks to identify all substrings in a text T that are order-isomorphic to a given pattern P. Additionally, if two strings of equal length can be split at a certain position such that the resulting substrings are order-isomorphic to each other, they are termed partitioned order-isomorphic. The order-preserving pattern matching with partition problem aims to find all substrings in a text T that are partitioned order-isomorphic to a specified pattern P. In this paper, we introduce the order-preserving pattern matching with 3-partition problem and present an algorithm that solves it in O(nm² + m² log m) time. We also perform experiments on various time series datasets to compare the number of matches and the runtime performance of the order-preserving pattern matching, partitioned order-preserving pattern matching, and order-preserving pattern matching with 3-partition algorithms.

A Text-to-SQL Model with Selective Decoding

Mirae Han, Geunyeong Jeong, Harksoo Kim

http://doi.org/10.5626/JOK.2025.52.11.907

Text-to-SQL involves converting natural language questions into SQL queries. Existing models primarily use either sketch-based or generation-based approaches. However, sketch-based methods struggle to fully capture the relationships among SQL elements, while generation-based methods suffer from slow inference speeds and frequent syntax errors. To address these challenges, this paper proposes a new decoding strategy called Selective Decoding. This approach combines the strengths of both methods by utilizing sketch structure and selectively applying the most suitable decoding method at each step. As a result, the model effectively captures the interrelationships among SQL elements and generate syntactically correct SQL queries. Experimental results demonstrate that the proposed model generates SQL queries more efficiently and accurately than existing models.

Efficient Mutation-based Fault Localization using Predictive Mutation Analysis

Yunho Kim, Namhoon Jung, Insub Lee, Hyoju Nam, Kyutae Cho

http://doi.org/10.5626/JOK.2025.52.11.915

One of the most challenging problems in software debugging is localizing the faulty code elements that cause errors. Mutation-based fault localization techniques, which employ mutation analysis, can accurately identify these faulty elements but are often impractical due to the significant time required for mutation analysis. This paper proposes an efficient mutation-based fault localization technique that utilizes predictive mutation analysis. Instead of conducting the time-consuming mutation analysis for every debugging attempt, the proposed approach trains a machine learning model using existing mutation analysis results. This model then predicts the outcomes of further mutation analyses, enhancing the efficiency of fault localization. Experimental results using the SIR benchmark demonstrate that the proposed method can accurately localize faulty code elements while requiring less time than existing mutation-based fault localization techniques.

Continuous Input Old Hangul Input Method Editor

Sungwook Kim

http://doi.org/10.5626/JOK.2025.52.11.923

The main difference between modern Hangul and Old Hangul input lies in continuity. There are three factors that make continuous input of Old Hangul difficult. First, the Old Hangul keyboard is unintuitive. The conventional layout of Old Hangul alphabets on the 2-set keyboard lacks a correlation between modern Hangul and Old Hangul alphabets. This misalignment makes it challenging to locate and input Old Hangul alphabets. Second, inputting Old Hangul initial compound consonants poses a challenge. Unlike modern Hangul, Old Hangul includes compound consonants in the initial position. Existing Old Hangul Input Method Editors (IMEs) can input initial compound consonants only after completing the composition, rendering continuous input of Old Hangul impossible. Third, inputting Bangjeom is problematic. Existing Old Hangul IMEs are difficult to input Bangjeom, and also finish composing when it is input, preventing further letter combinations. Due to these issues, inputting Old Hangul becomes difficult. As a solution to these issues, this paper proposes a placement of Old Hangul alphabets in accordance with the creation principle of Hunminjeongeum, and a method for continuous input of Old Hangul using the Yeoneum button and the Bangjeom button.

Prompt Engineering for Korean OCR Error Correction and Text Damage Restoration

Suhyun Park, Hyojin Lee, Sung-Pil Choi

http://doi.org/10.5626/JOK.2025.52.11.940

Optical Character Recognition (OCR) is a technology that converts text within images into machine-readable formats, making it essential in industries where document management is critical. However, the Korean language has a complex structure, featuring combined consonants and vowels, which can lead to low recognition accuracy. Improving this situation requires a vast dataset that encompasses all 11,172 complete Korean characters. Additionally, errors such as spacing and spelling mistakes, along with text distortion and damage, complicate post-processing with conventional spell-check models. To tackle these challenges, this paper proposes the use of a Large Language Model combined with Few-shot Learning and Prompt Engineering. Experimental results indicate that error correction accuracy improved by up to 18.18% compared to basic prompts, while text restoration and spacing correction achieved performance improvements of 21.6% and 17.26%, respectively. These findings demonstrate that even with a limited number of examples, Korean OCR errors can be effectively corrected, and damaged text can be restored.

OCR post-processing, Korean OCR error correction, Prompt engineering, LLM

Hyunsun Hwang, Youngjun Jung, Changki Lee

http://doi.org/10.5626/JOK.2025.52.11.948

Recent large language models utilize In-context Learning (ICL) techniques, which process existing tasks by inserting examples into prompts without requiring additional training data. This approach leverages their inherent language understanding capabilities developed during pre-training on massive datasets. However, these example-based ICL techniques rely on few-shot examples, leading to significant performance variations depending on the selection and structure of the examples in the prompt. This paper proposes methods to enhance example selection and reorganization when applying ICL techniques to Semantic Role Labeling, a challenging task that requires outputting semantic structures. In particular, we found that simply ordering examples in reverse similarity order can achieve performance close to the optimal example ordering for semantic role labeling tasks.

Cross Domain Alignment of Contrastive Multi-task Pretraining for Diagnosing Multiple Brain Disorders

Tae-Hun Kang, Sung-Bae Cho

http://doi.org/10.5626/JOK.2025.52.11.954

Functional magnetic resonance imaging (fMRI) is a crucial tool for diagnosing brain disorders and understanding their pathophysiology. However, reliably identifying abnormal brain patterns is challenging due to the high-dimensional inter-individual variability of fMRI data. In particular, variability in acquisition protocols introduces batch effects that lead to negative transfer, which degrades model performance. To address this issue, we propose a cross-domain alignment method for multi-task learning that manages heterogeneity between data sources and aligns essential normal and abnormal brain patterns. By aligning the common and discriminative features observed across multiple domains, our method effectively learns both normal and abnormal patterns. This approach facilitates large-scale pretraining on diverse datasets that include various brain disorders and healthy controls. Experiments conducted on clinical data from 2,424 subjects across four real-world disease datasets demonstrate that our proposed method achieves greater generalization than conventional single-disease training while reducing negative transfer, thereby validating its superior performance in diagnosing brain diseases.

semantic role labeling, large language model, in-context learning, example selection, example reordering

Kyoseong Koo, Hyeong Jin Shin, Jae Sung Lee

http://doi.org/10.5626/JOK.2025.52.11.961

In recent years, Transformer-based language models have been extensively developed and have shown substantial effectiveness across various natural language processing tasks. However, these models often incur significant computational costs in terms of time and memory complexity and they are predominantly designed as unidirectional autoregressive architectures. To address these limitations, research has increasingly focused on developing lightweight and bidirectional alternatives. This paper proposes Bi-RWKV, a bidirectional extension of the lightweight RWKV model, specifically designed for encoder-based language tasks. By examining eight configurations of bidirectional integration for RWKV’s time-mixing and channel-mixing modules, we identify the optimal architecture. To ensure a fair comparison of different model architectures, we maintained consistent hyperparameter values and comparable numbers of model parameters, deliberately omitting pretraining. Experimental results on named entity recognition, chunking, and Korean morphological and part-of-speech tagging demonstrate that Bi-RWKV achieves comparable or superior accuracy to Transformer-based encoders while reducing inference time by a factor of 2.7 to 4.

An Empirical Analysis of Domain Bias In Internet Image-Based Gemstone Identification Systems

Choolha Hwang, Dongha Shim

http://doi.org/10.5626/JOK.2025.52.11.970

Recent applications of machine learning and computer vision in gemstone identification increasingly utilize internet images as training data. However, commercial enhancements such as color correction, contour sharpening, and shape distortion-create visual discrepancies that result in domain bias, significantly degrading model performance in real-world environments. This study empirically analyzes this issue by designing nine training-evaluation scenarios. using three distinct datasets: academic (Set A), public internet (Set B), and directly captured unprocessed images (Set C). The results indicate that models trained on Set A experienced a 26% drop in accuracy when evaluated on Set C. In contrast, models trained on Set C maintained stable performance (F1 Score ≥ 0.83) when tested on Set A and Set B. These findings underscore the critical impact of visual discrepancies on model generalization and highlight the necessity of training with unprocessed real-world images to address domain bias for reliable AI-based gemstone identification systems.

Local MCP-based Agent System for SMILES Conversion of Molecular Structures

Sung Tae Yoo, Hyunsook Roh, Jae-Min Lee

http://doi.org/10.5626/JOK.2025.52.11.984

With the development of large language models (LLMs), interest in artificial intelligence (AI) agents that automate various tasks has grown. The Model Context Protocol (MCP) enables standardized interactions between LLMs and tools, significantly enhancing the usability of AI agents. However, the integration of MCP with external tools is susceptible to various security threats, and there are limitations in ensuring system reliability and safety. In this paper, we propose a local MCP-based agent system that implements the host, client, and server structures of MCP within a local environment to address these issues. By managing tool invocation, execution, and response generation entirely on the local system, the proposed architecture reduces potential security vulnerabilities. Specifically, the system's effectiveness was experimentally validated through a process that recognizes molecular structures from images in research papers, automatically converts them into Simplified Molecular Input Line Entry System (SMILES) format, and verifies the results. Each step was executed using the ReAct method, where the LLMs alternate between reasoning and tool invocation, leveraging the outcomes of each action to inform the next step. This system demonstrates the potential for safely and flexibly operating AI agents in defense, medical, and industrial sectors based on local environments.

SEG-SQL: Structure-aware Example Generation for Text-to-SQL Method with In-context Learning

Donguk Kwon, Jaewan Moon, Jongwook Lee

http://doi.org/10.5626/JOK.2025.52.11.992

Large language models (LLMs) that utilize in-context learning have significantly improved Text-to-SQL performance. However, traditional natural language similarity-based example selection often fails to ensure SQL structural similarity and can degrade performance when no structurally similar examples exist for a target SQL query. To address this issue, we propose SEG-SQL (Structure-aware Example Generation for Text-to-SQL). SEG-SQL first generates an initial SQL query from a given natural language question and converts it into a hint vector that captures its structural characteristics. It then modifies specific bits of this hint vector to create structurally similar SQL queries, which are subsequently transformed back into natural language through SQL-to-Text conversion. These transformed queries are used as few-shot examples for in-context learning. On the BIRD benchmark, SEG-SQL improved execution accuracy by 2.5% compared to CHESS and by 3.4% compared to OpenSearch-SQL. Under the most challenging difficulty setting, these gains increased to 30.0% and 62.2%, respectively. These results show that SEG-SQL consistently enhances the accuracy of in-context learning-based Text-to-SQL methods, even in complex environments.

Improving Compaction in LSM Tree by Applying Simple Copy to Key-Value Separated LSM-Tree

Chihyun Lee, Sungho Moon, Sangeun Chae, Beomsuk Nam

http://doi.org/10.5626/JOK.2025.52.11.1002

The LSM-Tree(Log-Structured Merge-Tree) is a widely adopted indexing structure known for its superior write performance across various application domains. However, a persistent performance bottleneck arises from write stalls caused by the relatively slow compaction process compared to the high rate of incoming data. Numerous studies, such as Key-Value Separated LSM-Tree and ZenFS, have been proposed to address this issue. In this paper, we introduce a method that replaces the LSM-Tree’s underlying file system with a ZNS-SSD (Zoned Namespace SSD) and leverages the simple copy to separate keys and values. This approach minimizes value-reading I/O from the disk while avoiding side effects like garbage collection, thereby accelerating the compaction process. We apply this technique to L0–L1 compaction and L0 compaction, effectively mitigating the write stall problem.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr