Search : [ keyword: 언어 모델 ] (33)

Hallucination Detection and Explanation Model for Enhancing the Reliability of LLM Responses

Sujeong Lee, Hayoung Lee, Seongsoo Heo, Wonik Choi

http://doi.org/10.5626/JOK.2025.52.5.404

Recent advancements in large language models (LLMs) have achieved remarkable progress in natural language processing. However, reliability issues persist due to hallucination, which remains a significant challenge. Existing hallucination research primarily focuses on detection, lacking the capability to explain the causes and context of hallucinations. In response, this study proposes a hallucination-specialized model that goes beyond mere detection by providing explanations for identified hallucinations. The proposed model was designed to classify hallucinations while simultaneously generating explanations, allowing users to better trust and understand the model’s responses. Experimental results demonstrated that the proposed model surpassed large-scale models such as Llama3 70B and GPT-4 in hallucination detection accuracy while consistently generating high-quality explanations. Notably, the model maintained stable detection and explanation performance across diverse datasets, showcasing its adaptability. By integrating hallucination detection with explanation generation, this study introduces a novel approach to evaluating hallucinations in language models.

Enhancing Passage Selection and Answer Generation in FiD Systems Using Relevance Gating

Seung-ho Choi, Shihyun Park, Minsang Kim, Chansol Park, Junho Wang, Ji-Yoon Kim, Bong-Su Kim

http://doi.org/10.5626/JOK.2025.52.5.385

In this paper, we proposed a novel approach to enhance the performance of the Fusion-in-Decoder (FiD) model in open-domain question answering systems. The FiD model operates by independently encoding multiple passages and then combining them during the decoding stage to generate answers. However, this method has the drawback of not filtering out passages containing unnecessary information, thereby placing an excessive burden on the decoder. To address this issue, we introduced a Relevance Gate inspired by the forget gate of Long Short-Term Memory (LSTM). This gate can evaluate the relevance of each passage in parallel, selectively transmitting information to the decoder, thereby significantly improving the accuracy and efficiency of answer generation. Additionally, we applied a new activation function suitable for open-domain question answering systems instead of the sigmoid function to ensure the model's stability.

Enhancing Molecular Understanding in LLMs through Multimodal Graph-SMILES Representations

http://doi.org/10.5626/JOK.2025.52.5.379

Recent advancements in large language models (LLMs) have shown remarkable performace across various tasks, with increasing focus on multimodal research. Notably, BLIP-2 can enhance performance by efficiently aligning images and text using a Q-Former, aided by an image encoder pre-trained on multimodal data. Inspired by this, the MolCA model extends BLIP-2 to the molecular domain to improve performance. However, the graph encoder in MolCA is pre-trained on unimodal data, necessitating updates during model training, which is a limitation. Therefore, this paper replaced it with a graph encoder pre-trained on multimodal data and frozen while training the model. Experimental results showed that using the graph encoder pre-trained on multimodal data generally enhanced performance. Additionally, unlike the graph encoder pre-trained on unimodal data, which performed better when updated, the graph encoder pre-trained on multimodal data achieved superior results across all metrics when frozen.

Efficiently Lightweight Korean Language Model with Post-layer Pruning and Multi-stage Fine-tuning

Jae Seong Kim, Suan Lee

http://doi.org/10.5626/JOK.2025.52.3.260

The increasing size of large-scale language models has led to the need for lightweighting for practical applications. This study presents a method to reduce the existing 8B model to 5B by late-layer pruning, while maintaining and improving its performance through two phases of fine-tuning. In the broad fine-tuning phase, we expanded the model's ability to understand and generate Korean by utilizing English-Korean parallel data and a large Korean corpus, and in the refined fine-tuning phase, we enhanced its expressive and inferential capabilities with high-quality datasets. In addition, we integrated the strengths of individual models through model merging techniques. In the LogicKor leaderboard evaluation, the proposed model performed well in the areas of reasoning, writing, and comprehension, with an overall score of 4.36, outperforming the original Llama-3.1-8B-Instruct model (4.35). This demonstrates a 37.5% reduction in model size while still improving performance.

An Experimental Study on the Text Generation Capability for Chart Image Descriptions in Korean SLLM

Hyojun An, Sungpil Choi

http://doi.org/10.5626/JOK.2025.52.2.132

This study explores the capability of using Small Large Language Models(SLLMs) for automatically generating and interpreting information from chart images. To achieve this goal, we built an instruction dataset for SLLM training by extracting text data from chart images and adding descriptive information. We conducted instruction tuning on a Korean SLLM and evaluated its ability to generate information from chart images. The experimental results demonstrated that the SLLM, which was fine-tuned with the constructed instruction dataset, was capable of generating descriptive text comparable to OpenAI's GPT-4o-mini API. This study suggests that, in the future, Korean SLLMs may be effectively used for generating descriptive text and providing information across a broader range of visual data.

Political Bias in Large Language Models and its Implications on Downstream Tasks

Jeong yeon Seo, Sukmin Cho, Jong C. Park

http://doi.org/10.5626/JOK.2025.52.1.18

This paper contains examples of political leaning bias that can be offensive. Abstract As the performance of the Large Language Models (LLMs) improves, direct interaction with users becomes possible, raising ethical issues. In this study, we design two experiments to explore the diverse spectrum of political stances that an LLM exhibits and how these stances affect downstream tasks. We first define the inherent political stances of the LLM as the baseline and compare results from three different inputs (jailbreak, political persona, and jailbreak persona). The results of the experiments show that the political stances of the LLM changed the most with the jailbreak attack, while lesser changes were observed with the other two inputs. Moreover, an experiment involving downstream tasks demonstrated that the distribution of altered inherent political stances can affect the outcome of these tasks. These results suggest that the model generates responses that align more closely like its inherent stance rather than the user’s intention to personalize responses. We conclude that the intrinsic political bias of the model and its judgments can be explicitly communicated to users.

Adversarial Training with Contrastive Learning in NLP

Daniela N. Rim, DongNyeong Heo, Heeyoul Choi

http://doi.org/10.5626/JOK.2025.52.1.52

Adversarial training has been extensively studied in natural language processing (NLP) settings to make models robust so that similar inputs derive similar outcomes semantically. However, since language has no objective measure of semantic similarity, previous works use an external pre-trained NLP model to ensure this similarity, introducing an extra training stage with huge memory consumption. This work proposes adversarial training with contrastive learning (ATCL) to train a language processing model adversarially using the benefits of contrastive learning. The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning. We apply ATCL to language modeling and neural machine translation tasks showing an improvement in the quantitative (perplexity and BLEU) scores. Furthermore, ATCL achieves good qualitative results in the semantic level for both tasks without using a pre-trained model through simulation.

Recommendation Technique for Bug Fixers by Fine-tuning Language Models

Dae-Sung Wang, Hoon Seong, Chan-Gun Lee

http://doi.org/10.5626/JOK.2022.49.11.987

The scale and complexity of software continue to increase; hence they contribute to the occurrence of diverse bugs. Therefore, the necessity of systematic bug management has been raised. A few studies have proposed automating the assignment of bug fixers using word-based deep learning models. However, their accuracy is not satisfactory due to context of the word is ignored, and there is an excessive number of classes. In this paper, the accuracy was improved by about 27%p over the top-10 accuracies by using a fine-tuned pre-trained language model based on BERT, RoBERTa, DeBERTa, and CodeBERT. Experiments confirmed that the accuracy was about 70%. Through this, we showed that the fine-tuned pretrained language model could be effectively applied to automated bug-fixer assignments.

Structuralized External Knowledge and Multi-task Learning for Knowledge Selection

Junhee Cho, Youngjoong Ko

http://doi.org/10.5626/JOK.2022.49.10.884

Typically, task-oriented dialog systems use well-structured knowledge, such as databases, to generate the most appropriate responses to users" questions. However, to generate more appropriate and fluent responses, external knowledge, which is unstructured text data such as web data or FAQs, is necessary. In this paper, we propose a novel multi-task learning method with a pre-trained language model and a graph neural network. The proposed method makes the system select the external knowledge effectively by not only understanding linguistic information but also grasping the structural information latent in external knowledge which is converted into structured data, graphs, using a dependency parser. Experimental results show that our proposed method obtains higher performance than the traditional bi-encoder or cross-encoder methods that use pre-trained language models.

Entity Graph Based Dialogue State Tracking Model with Data Collection and Augmentation for Spoken Conversation

Haeun Yu, Youngjoong Ko

http://doi.org/10.5626/JOK.2022.49.10.891

As a part of a task-oriented dialogue system, dialogue state tracking is a task for understanding the dialogue and extracting user’s need in a slot-value form. Recently, Dialogue System Track Challenge (DSTC) 10 Track 2 initiated a challenge to measure the robustness of a dialogue state tracking model in a spoken conversation setting. The released evaluation dataset has three characteristics: new multiple value scenario, three-times more entities, and utterances from automatic speech recognition module. In this paper, to ensure the model’s robust performance, we introduce an extraction-based dialogue state tracking model with entity graph. We also propose to use data collection and template-based data augmentation method. Evaluation results prove that our proposed method improves the performance of the extraction-based dialogue state tracking model by 1.7% of JGA and 0.57% of slot accuracy compared to baseline model.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr