Vol. 52, No. 6,
Jun. 2025
Digital Library
Phishing Webpage Detection using URL and HTML Graphs based on a Multimodal AutoEncoder Ensemble
Jun-Ho Yoon, Seok-Hun Choi, Hae-Jung Kim, Seok-Jun Buu
http://doi.org/10.5626/JOK.2025.52.6.461
As the internet continues to evolve, phishing attacks are increasingly targeting users, highlighting the need for effective detection methods. Traditional approaches focus on analyzing URL character sequences; however, phishing URLs often mimic legitimate patterns and have a short lifespan, limiting detection accuracy. To address this, we propose a multimodal ensemble-based phishing detection method that leverages both URL strings and HTML graph data. Character-level URL sequences are processed using a Convolutional AutoEncoder (CAE), while HTML DOM structures are converted into graph formats and analyzed with a Graph Convolutional AutoEncoder (GCAE). The extracted latent vectors are integrated via a Transformer layer to classify phishing webpages. The proposed model improves detection performance by up to 18.91 percentage points in F1 Score compared to existing methods, and case analysis reveals the interrelationship between URL and HTML features.
Analysis of GEMV Kernel Computations by Address Mapping Approaches Based on GPU and PIM Architectures
http://doi.org/10.5626/JOK.2025.52.6.469
Processing-in-Memory (PIM) is an architectural approach that can overcome bandwidth limitations between host processors and off-chip memory. PIM can improve data computation performance by exploiting high internal bandwidth and parallel computations within a memory module. Therefore, it is expected that PIMs can be paired with high-performance processors such as GPUs to achieve overall performance improvements. However, due to differences in memory address mapping schemes between PIM and GPU architectures, applying PIM's address mapping method directly to GPUs may result in a decrease in overall performance. In this paper, we analyze the performance impact of PIM's address mapping schemes on GPU using memory-intensive general matrix-vector product (GEMV) kernels. Our evaluation results exhibit that PIM’s address mapping schemes degrade performance and memory bandwidth on GPUs, indicating that differences in mapping schemes could potentially cause performance degradation in GPU-PIM architectures.
A Retrieval Augmented Generation(RAG) System Using Query Rewritting Based on Large Langauge Model(LLM)
Minsu Han, Seokyoung Hong, Myoung-Wan Koo
http://doi.org/10.5626/JOK.2025.52.6.474
This paper proposes a retrieval pipeline that can be effectively utilized in fields requiring expert knowledge without requiring fine-tuning. To achieve high accuracy, we introduce a query rewriting retrieval method that leverages large language models to generate examples similar to the given question, achieving higher similarity than existing retrieval models. The proposed method demonstrates excellent performance in both automated evaluations and expert qualitative assessments, while also providing explainability in retrieval results through generated examples. Additionally, we suggest prompts that can be utilized in various domains requiring specialized knowledge during the application of this method. Furthermore, we propose a pipeline method that incorporates a Top-1 retrieval model, which chooses the most relevant document from the three returned by the query rewriting retrieval model. This aims to prevent the hallucination issue caused by the input of unnecessary documents into the large language model.
A Graph Neural Network Approach for Predicting the Lung Carcinogenicity of Single Molecular Compounds
http://doi.org/10.5626/JOK.2025.52.6.482
Cancer is one of the major diseases causing millions of deaths worldwide every year, and lung cancer has been recorded as the leading cause of cancer-related deaths in Korea in 2022. Therefore, research on lung cancer-causing compounds is essential, and this study proposes and evaluates a novel approach to predict lung cancer-causing potential using graph neural networks to overcome the limitations of existing machine learning and deep learning methods. Based on SMILES(Simplified Molecular Input Line Entry System) information from the compound carcinogenicity databases CPDB, CCRIS, IRIS and T3DB, the structure and chemical properties of molecules were converted into graph data for training, and the proposed model showed superior prediction performance compared to other models. This demonstrates the potential of graph neural networks as an effective tool for lung cancer prediction and suggests that they can make important contributions to future cancer research and treatment development.
PFD Simulator based Deep Reinforcement Learning for Energy Consumption Minimization of Electric RTO
http://doi.org/10.5626/JOK.2025.52.6.490
This study proposes a method that could generate data through a simulator in situations where data collection is difficult. A deep reinforcement learning agent is then trained based on generated data to maintain stable electric regenerative thermal oxidizer (RTO) operation and minimize energy consumption. First, data were generated from a simulator created using actual equipment Process Flow Diagrams (PFDs) and field operation methods. An environment that incorporated states, actions, and rewards was established for agent training. Performance evaluation results demonstrated that the control using the deep reinforcement learning agent trained with this method enabled more stable operation of the electric RTO system, while simultaneously reducing power consumption by up to approximately 9% compared to the conventional operation strategy.
Hierarchical Semantic Prompt Design for Robust Open-Vocabulary Object Detection
http://doi.org/10.5626/JOK.2025.52.6.499
Open-Vocabulary Object Detection (OVOD) has been proposed to overcome the limitation of traditional object detection methods, which are restricted to recognizing only categories seen during training. While conventional OVOD approaches generate classifiers using simple prompts like “a {category}”, this paper incorporates the hierarchical structure of object categories into prompts to enhances detection performance. Specifically, we applied prompt engineering techniques that could reduce the use of lengthy connectives and place important keywords at the beginning of the sentence. This resulted in more effective prompts that could capture the intrinsic meaning of hierarchical information. Our method allows for the generation of classifiers without additional computational resources or retraining. Furthermore, it demonstrates strong generalizability. It can be applied to other tasks such as image captioning and medical image analysis. By leveraging hierarchical expressions familiar to humans, our approach also contributes to improving the interpretability of model outputs.
Safety Evaluation of Large Language Models Using Risky Humor
JoEun Kang, GaYeon Jung, HanSaem Kim
http://doi.org/10.5626/JOK.2025.52.6.508
This study evaluated the safety of generative language models through the lens of Korean humor that included socially risky content. Recently, concerns regarding the misuse of generative language models have intensified, as these models can generate plausible responses to inputs and prompts that may deviate from social norms, ethical standards, and common sense. In this context, this study aimed to identify and mitigate potential risks associated with artificial intelligence (AI) by analyzing risks inherent in humor and developing a benchmark for their evaluation. The socially risky humor examined in this study differs from conventional harmful content, as the playful and entertaining nature of humor can easily obscure unethical or risky elements. This characteristic closely resembles subtle and indirect input patterns, which are critical in AI safety assessments. The experiment involved binary classification of generated results from input requests related to unethical humor as safe or unsafe. Subsequently, the safety level of the experimental model was evaluated across four levels. Consequently, this study evaluated the safety of prominent generative language models, including GPT-4o, Gemini, and Claude. Findings indicated that these models demonstrated vulnerabilities in ethical judgment when faced with risky humor.
A Reinforcement Learning-Based Path Optimization for Autonomous Underwater Vehicle Mission Execution in Dynamic Marine Environments
Hyojun Ahn, Shincheon Ahn, Emily Jimin Roh, Ilseok Song, Jooeun Kwon, Sei Kwon, Youngdae Kim, Soohyun Park, Joongheon Kim
http://doi.org/10.5626/JOK.2025.52.6.519
This paper proposes an AOPF (Autonomous Underwater Vehicle Optimal Path Finder) algorithm for AUV mission execution and path optimization in dynamic marine environments. The proposed algorithm utilizes a PPO (Proximal Policy Optimization)-based reinforcement learning method in combination with a 3-degree-of-freedom (DOF) model, enabling a balanced approach between obstacle avoidance and effective target approach. This method is designed to achieve faster convergence and higher mission performance compared to the DDPG (Deep Deterministic Policy Gradient) algorithm. Experimental results demonstrated that the algorithm enabled stable learning and generated efficient paths. Furthermore, the proposed approach shows strong potential for real-world deployment in complex marine environments. It offers scalability to multi-AUV cooperative control scenarios.
Linear Sequential Recommendation Models using Textual Side Information
Dongcheol Lee, Minjin Choi, Jongwuk Lee
http://doi.org/10.5626/JOK.2025.52.6.529
Recently, research on leveraging auxiliary information in sequential recommendation systems is being actively conducted. Most approaches have focused on combining language models with deep neural networks. However, they often lead to high computational costs and latency issues. While linear recommendation models can serve as an efficient alternative, research on how to effectively incorporate auxiliary information is lacking. This study proposed a framework that could effectively utilize auxiliary information within a linear model. Since textual data cannot be directly used in linear model training, we transformed item texts into dense vectors using a pre-trained text encoder. Although these vectors contained rich information, they failed to capture relationships between items. To address this, we applied graph convolution to obtain enhanced item representations. These representations were then used alongside the user-item interaction matrix for linear model training. Extensive experiments showed that the proposed method improved the overall performance, particularly in recommending less popular items.
Pretrained Large Language Model-based Drug-Target Binding Affinity Prediction for Mutated Proteins
Taeung Song, Jin Hyuk Kim, Hyeon Jun Park, Jonghwan Choi
http://doi.org/10.5626/JOK.2025.52.6.539
Drug development is a costly and time-consuming process. Accurately predicting the impact of protein mutations on drug-target binding affinity remains a major challenge. Previous studies have utilized long short-term memory (LSTM) and transformer models for amino acid sequence processing. However, LSTMs suffer from long-sequence dependency issues, while transformers face high computational costs. In contrast, pretrained large language models (pLLMs) excel in handling long sequences, yet prompt-based approaches alone are insufficient for accurate binding affinity prediction. This study proposed a method that could leverage pLLMs to analyze protein structural data, transform it into embedding vectors, and use a separate machine learning model for numerical binding affinity prediction. Experimental results demonstrated that the proposed approach outperformed conventional LSTM and prompt-based methods, achieving lower root mean square error (RMSE) and higher Pearson correlation coefficient (PCC), particularly in mutation-specific predictions. Additionally, performance analysis of pLLM quantization confirmed that the method maintained sufficient accuracy with reduced computational cost.
A Large Language Model-based Multi-domain Recommender System using Model Merging
http://doi.org/10.5626/JOK.2025.52.6.548
Recent research in recommender systems has increasingly focused on leveraging pre-trained large language models (LLMs) to effectively understand the natural language information associated with recommendation items. While these LLM-based recommender systems achieve high accuracy, they have a limitation in that they require training separate recommendation models for each domain. This increases the costs of storing and inferring multiple models and makes it difficult to share knowledge across domains. To address this issue, we propose an LLM-based recommendation model that effectively operates across diverse recommendation domains by applying task vector-based model merging. During the merging process, knowledge distillation is utilized from individually trained domain-specific recommendation models to learn optimal merging weights. Experimental results show that our proposed method improves recommendation accuracy by an average of 2.75% across eight domains compared to recommender models utilizing existing model merging methods, while also demonstrating strong generalization performance in previously unseen domains.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr