Digital Library[ Search Result ]
Aspect-Based Comparative Summarization with Large Language Model
http://doi.org/10.5626/JOK.2025.52.7.579
This paper proposes an aspect-based comparative summarization method to generate summary comparisons between two items based on their reviews, aiming to assist users in making informed decisions. Given the reviews of two items, aspects are dynamically generated from each review using a large language model. To identify common aspects for comparison, the generated aspect lists of both items are merged. The review sentences of each item are classified into the most relevant aspects, and then the summarization process removes redundant and unnecessary information. Subsequently, an abstract summary is generated for each common aspect to capture the overall content of the reviews. Experiments were conducted in the domains of hotels, electronic devices, and furniture, comparing human-written summaries with system-generated ones. The proposed method demonstrated superior summarization performance compared to existing comparison models.
Enhancing LLM-based Zero-Shot Conversational Recommendation via Reasoning Path
Heejin Kook, Seongmin Park, Jongwuk Lee
http://doi.org/10.5626/JOK.2025.52.7.617
Conversational recommender systems provide personalized recommendations through bi-directional interactions with users. Traditional conversational recommender systems rely on external knowledge, such as knowledge graphs, to effectively capture user preferences. While recent rapid advancement of large language models has enabled zero-shot recommendations, challenges remain in understanding users' implicit preferences and designing optimal reasoning paths. To address these limitations, this study investigates the importance of appropriate reasoning path construction in zero-shot based conversational recommender systems and explores the potential of using a new approach based on this foundation. The proposed framework consists of two stages: (1) comprehensively extracting both explicit and implicit preferences from conversational context, and (2) constructing reasoning trees to select optimal reasoning paths based on these preferences. Experimental results on benchmark datasets INSPIRED and ReDial show that our proposed method achieves up to 11.77% improvement in Recall@10 compared to existing zero-shot methods, It even outperforms some learning-based models.
Pretrained Large Language Model-based Drug-Target Binding Affinity Prediction for Mutated Proteins
Taeung Song, Jin Hyuk Kim, Hyeon Jun Park, Jonghwan Choi
http://doi.org/10.5626/JOK.2025.52.6.539
Drug development is a costly and time-consuming process. Accurately predicting the impact of protein mutations on drug-target binding affinity remains a major challenge. Previous studies have utilized long short-term memory (LSTM) and transformer models for amino acid sequence processing. However, LSTMs suffer from long-sequence dependency issues, while transformers face high computational costs. In contrast, pretrained large language models (pLLMs) excel in handling long sequences, yet prompt-based approaches alone are insufficient for accurate binding affinity prediction. This study proposed a method that could leverage pLLMs to analyze protein structural data, transform it into embedding vectors, and use a separate machine learning model for numerical binding affinity prediction. Experimental results demonstrated that the proposed approach outperformed conventional LSTM and prompt-based methods, achieving lower root mean square error (RMSE) and higher Pearson correlation coefficient (PCC), particularly in mutation-specific predictions. Additionally, performance analysis of pLLM quantization confirmed that the method maintained sufficient accuracy with reduced computational cost.
Safety Evaluation of Large Language Models Using Risky Humor
JoEun Kang, GaYeon Jung, HanSaem Kim
http://doi.org/10.5626/JOK.2025.52.6.508
This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.
A Retrieval Augmented Generation(RAG) System Using Query Rewritting Based on Large Langauge Model(LLM)
Minsu Han, Seokyoung Hong, Myoung-Wan Koo
http://doi.org/10.5626/JOK.2025.52.6.474
This paper proposes a retrieval pipeline that can be effectively utilized in fields requiring expert knowledge without requiring fine-tuning. To achieve high accuracy, we introduce a query rewriting retrieval method that leverages large language models to generate examples similar to the given question, achieving higher similarity than existing retrieval models. The proposed method demonstrates excellent performance in both automated evaluations and expert qualitative assessments, while also providing explainability in retrieval results through generated examples. Additionally, we suggest prompts that can be utilized in various domains requiring specialized knowledge during the application of this method. Furthermore, we propose a pipeline method that incorporates a Top-1 retrieval model, which chooses the most relevant document from the three returned by the query rewriting retrieval model. This aims to prevent the hallucination issue caused by the input of unnecessary documents into the large language model.
Hallucination Detection and Explanation Model for Enhancing the Reliability of LLM Responses
Sujeong Lee, Hayoung Lee, Seongsoo Heo, Wonik Choi
http://doi.org/10.5626/JOK.2025.52.5.404
Recent advancements in large language models (LLMs) have achieved remarkable progress in natural language processing. However, reliability issues persist due to hallucination, which remains a significant challenge. Existing hallucination research primarily focuses on detection, lacking the capability to explain the causes and context of hallucinations. In response, this study proposes a hallucination-specialized model that goes beyond mere detection by providing explanations for identified hallucinations. The proposed model was designed to classify hallucinations while simultaneously generating explanations, allowing users to better trust and understand the model’s responses. Experimental results demonstrated that the proposed model surpassed large-scale models such as Llama3 70B and GPT-4 in hallucination detection accuracy while consistently generating high-quality explanations. Notably, the model maintained stable detection and explanation performance across diverse datasets, showcasing its adaptability. By integrating hallucination detection with explanation generation, this study introduces a novel approach to evaluating hallucinations in language models.
Enhancing Molecular Understanding in LLMs through Multimodal Graph-SMILES Representations
http://doi.org/10.5626/JOK.2025.52.5.379
Recent advancements in large language models (LLMs) have shown remarkable performace across various tasks, with increasing focus on multimodal research. Notably, BLIP-2 can enhance performance by efficiently aligning images and text using a Q-Former, aided by an image encoder pre-trained on multimodal data. Inspired by this, the MolCA model extends BLIP-2 to the molecular domain to improve performance. However, the graph encoder in MolCA is pre-trained on unimodal data, necessitating updates during model training, which is a limitation. Therefore, this paper replaced it with a graph encoder pre-trained on multimodal data and frozen while training the model. Experimental results showed that using the graph encoder pre-trained on multimodal data generally enhanced performance. Additionally, unlike the graph encoder pre-trained on unimodal data, which performed better when updated, the graph encoder pre-trained on multimodal data achieved superior results across all metrics when frozen.
Enhancing Retrieval-Augmented Generation Through Zero-Shot Sentence-Level Passage Refinement with LLMs
Taeho Hwang, Soyeong Jeong, Sukmin Cho, Jong C. Park
http://doi.org/10.5626/JOK.2025.52.4.304
This study presents a novel methodology designed to enhance the performance and effectiveness of Retrieval-Augmented Generation (RAG) by utilizing Large Language Models (LLMs) to eliminate irrelevant content at the sentence level from retrieved documents. This approach refines the content of passages exclusively through LLMs, avoiding the need for additional training or data, with the goal of improving the performance in knowledge-intensive tasks. The proposed method was tested in an open-domain question answering (QA) environment, where it demonstrated its ability to effectively remove unnecessary content and outperform over traditional RAG methods. Overall, our approach has proven effective in enhancing performance compared to conventional RAG techniques and has shown the capability to improve RAG's accuracy in a zero-shot setting without requiring additional training data.
An Experimental Study on the Text Generation Capability for Chart Image Descriptions in Korean SLLM
http://doi.org/10.5626/JOK.2025.52.2.132
This study explores the capability of using Small Large Language Models(SLLMs) for automatically generating and interpreting information from chart images. To achieve this goal, we built an instruction dataset for SLLM training by extracting text data from chart images and adding descriptive information. We conducted instruction tuning on a Korean SLLM and evaluated its ability to generate information from chart images. The experimental results demonstrated that the SLLM, which was fine-tuned with the constructed instruction dataset, was capable of generating descriptive text comparable to OpenAI's GPT-4o-mini API. This study suggests that, in the future, Korean SLLMs may be effectively used for generating descriptive text and providing information across a broader range of visual data.
Political Bias in Large Language Models and its Implications on Downstream Tasks
Jeong yeon Seo, Sukmin Cho, Jong C. Park
http://doi.org/10.5626/JOK.2025.52.1.18
This paper contains examples of political leaning bias that can be offensive. Abstract As the performance of the Large Language Models (LLMs) improves, direct interaction with users becomes possible, raising ethical issues. In this study, we design two experiments to explore the diverse spectrum of political stances that an LLM exhibits and how these stances affect downstream tasks. We first define the inherent political stances of the LLM as the baseline and compare results from three different inputs (jailbreak, political persona, and jailbreak persona). The results of the experiments show that the political stances of the LLM changed the most with the jailbreak attack, while lesser changes were observed with the other two inputs. Moreover, an experiment involving downstream tasks demonstrated that the distribution of altered inherent political stances can affect the outcome of these tasks. These results suggest that the model generates responses that align more closely like its inherent stance rather than the user’s intention to personalize responses. We conclude that the intrinsic political bias of the model and its judgments can be explicitly communicated to users.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr