Journal of KIISE

Search : [ keyword: knowledge distillation ] (8)

Recent research in recommender systems has increasingly focused on leveraging pre-trained large language models (LLMs) to effectively understand the natural language information associated with recommendation items. While these LLM-based recommender systems achieve high accuracy, they have a limitation in that they require training separate recommendation models for each domain. This increases the costs of storing and inferring multiple models and makes it difficult to share knowledge across domains. To address this issue, we propose an LLM-based recommendation model that effectively operates across diverse recommendation domains by applying task vector-based model merging. During the merging process, knowledge distillation is utilized from individually trained domain-specific recommendation models to learn optimal merging weights. Experimental results show that our proposed method improves recommendation accuracy by an average of 2.75% across eight domains compared to recommender models utilizing existing model merging methods, while also demonstrating strong generalization performance in previously unseen domains.

Efficient Large Language Model Based Passage Re-Ranking Using Single Token Representations

Jeongwoo Na, Jun Kwon, Eunseong Choi, Jongwuk Lee

http://doi.org/10.5626/JOK.2025.52.5.395

In information retrieval systems, document re-ranking involves reordering a set of candidate documents based on evaluation of their relevance to a given query. Leveraging extensive natural language understanding capabilities of large language models(LLMs), numerous studies on document re-ranking have been conducted, demonstrating groundbreaking performance. However, studies utilizing large language models focus solely on improving reranking performance, resulting in degraded efficiency due to excessively long input sequences and the need for repetitive inference. To address these limitations, we propose ListT5++, a novel model that represents the relevance between a query and a passage using single token embedding and significantly improves the efficiency of LLM-based reranking through a single-step decoding strategy that minimizes the decoding process. Experimental results showed that ListT5++ could maintain accuracy levels comparable to existing methods while reducing inference latency by a factor of 29.4 relative to the baseline. Moreover, our approach demonstrates robust characteristics by being insensitive to th initial ordering of candidate documents, thereby ensuring high practicality in real-time retrieval environments.

A Survey of Advantages of Self-Supervised Learning Models in Visual Recognition Tasks

Euihyun Yoon, Hyunjong Lee, Donggeon Kim, Joochan Park, Jinkyu Kim, Jaekoo Lee

http://doi.org/10.5626/JOK.2024.51.7.609

Recently, the field of teacher-based artificial intelligence (AI) has been rapidly advancing. However, teacher-based learning relies on datasets with specified correct answers, which can increase the cost of obtaining these correct answers. To address this issue, self-supervised learning, which can learn general features of photos without needing correct answers, is being researched. In this paper, various self-supervised learning models were classified based on their learning methods and backbone networks. Their strengths, weaknesses, and performances were then compared and analyzed. Photo classification tasks were used for performance comparison. For comparing the performance of transfer learning, detailed prediction tasks were also compared and analyzed. As a result, models that only used positive pairs achieved higher performance by minimizing noise than models that used both positive and negative pairs. Furthermore, for fine-grained predictions, methods such as masking images for learning or utilizing multi-stage models achieved higher performance by additionally learning regional information.

Model Architecture Analysis and Extension for Improving RF-based Multi-Person Pose Estimation Performance

SeungHwan Shin, Yusung Kim

http://doi.org/10.5626/JOK.2024.51.3.262

An RF-based multi-person pose estimation system can estimate each human posture even when it is challenging to obtain clear visibility due to obstacles or lighting conditions. Traditionally, a cross-modal teacher-student learning approach has been employed. The approach utilizes pseudo-label data acquired by using images captured concurrently with RF signal collection as input for a pretrained image-based pose estimation model. In a previous research study, the research team applied cross-modal knowledge distillation to mimic the feature maps of image-based learning models and referred to it as "visual cues." This enhanced the performance of RF signal-based pose estimation. In this paper, performance is compared based on the ratio at which the learned visual cues are concatenated, and an analysis of the impact of segmentation mask learning and the use of multiframe inputs on multi-person pose estimation performance is presented. It is demonstrated that the best performance is achieved when visual cues and multiframe inputs are used in combination.

Fair Feature Distillation Using Teacher Models of Larger Architecture

Sangwon Jung, Taesup Moon

http://doi.org/10.5626/JOK.2021.48.11.1176

Achieving algorithmic fairness is becoming increasingly essential for various vision applications. Although a state-of-the-art fairness method, dubbed as MMD-based Fair feature Distillation (MFD), significantly improved accuracy and fairness via feature distillation based on Maximum Mean Discrepancy (MMD) compared to previous works, MFD could be limitedly applied into when a teacher model has the same architecture as a student model. In this paper, based on MFD, we propose a systematic approach that mitigates unfair biases via feature distillation of a teacher model of larger architecture, dubbed as MMD-based Fair feature Distillation with a regressor (MFD-R). Throughout the extensive experiments, we showed that our MFD-R benefits from the use of the larger teacher compared to MFD as well as other baseline methods.

Combining Sentiment-Combined Model with Pre-Trained BERT Models for Sentiment Analysis

Sangah Lee, Hyopil Shin

http://doi.org/10.5626/JOK.2021.48.7.815

It is known that BERT can capture various linguistic knowledge from raw text via language modeling without using any additional hand-crafted features. However, some studies have shown that BERT-based models with an additional use of specific language knowledge have higher performance for natural language processing problems associated with that knowledge. Based on such finding, we trained a sentiment-combined model by adding sentiment features to the BERT structure. We constructed sentiment feature embeddings using sentiment polarity and intensity values annotated in a Korean sentiment lexicon and proposed two methods (external fusing and knowledge distillation) to combine sentiment-combined model with a general-purpose BERT pre-trained model. The external fusing method resulted in higher performances in Korean sentiment analysis tasks with movie reviews and hate speech datasets than baselines from other pre-trained models not fused with sentiment-combined models. We also observed that adding sentiment features to the BERT structure improved the model’s language modeling and sentiment analysis performance. Furthermore, when implementing sentiment-combined models, training time and cost could be decreased by using a small-scale BERT model with a small number of layers, dimensions, and steps.

Conditional Knowledge Distillation for Model Specialization

Hakbin Kim, Dong-Wan Choi

http://doi.org/10.5626/JOK.2021.48.4.369

Many recent works on model compression in neural networks are based on knowledge distillation (KD). However, since the basic goal of KD is to transfer the entire knowledge set of a teacher model to a student model, the standard KD may not represent the best use of the model’s capacity when a user wishes to classify only a small subset of classes. Also, it is necessary to possess the original teacher model dataset for KD, but for various practical reasons, such as privacy issues, the entire dataset may not be available. Thus, this paper proposes conditional knowledge distillation (CKD), which only distills specialized knowledge corresponding to a given subset of classes, as well as data-free CKD (DF-CKD), which does not require the original data. As a major extension, we devise Joint-CKD, which jointly performs DF-CKD and CKD with only a small additional dataset collected by a client. Our experimental results show that the CKD and DF-CKD methods are superior to standard KD, and also confirm that joint use of CKD and DF-CKD is effective at further improving the overall accuracy of a specialized model.

Low-Resolution Image Classification Using Knowledge Distillation From High-Resolution Image Via Self-Attention Map

Sungho Shin, Joosoon Lee, Junseok Lee, Seungjun Choi, Kyoobin Lee

http://doi.org/10.5626/JOK.2020.47.11.1027

Traditional deep-learning models have been developed using high-quality images. However, when the low resolution images are rendered, the performances of the model drop drastically. To develop a deep-learning model that can respond effectively to low-resolution images, we extracted the information from the model, which uses high-resolution images as input, in the form of the Attention Map. Using the knowledge distillation technique, the information delivering Attention Map, extracted from the high-resolution images to low-resolution image models, could reduce the error rate by 2.94%, when classifying the low-resolution CIFAR images of 16×16 resolution. This was at 38.43% of the error reduction rate when the image resolution was lowered from 32×32 to 16×16, which could demonstrate excellence in this network.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

A Large Language Model-based Multi-domain Recommender System using Model Merging

Efficient Large Language Model Based Passage Re-Ranking Using Single Token Representations

A Survey of Advantages of Self-Supervised Learning Models in Visual Recognition Tasks

Model Architecture Analysis and Extension for Improving RF-based Multi-Person Pose Estimation Performance

Fair Feature Distillation Using Teacher Models of Larger Architecture

Combining Sentiment-Combined Model with Pre-Trained BERT Models for Sentiment Analysis

Conditional Knowledge Distillation for Model Specialization

Low-Resolution Image Classification Using Knowledge Distillation From High-Resolution Image Via Self-Attention Map

Search

Editorial Office