Digital Library[ Search Result ]
Efficient Large Language Model Based Passage Re-Ranking Using Single Token Representations
Jeongwoo Na, Jun Kwon, Eunseong Choi, Jongwuk Lee
http://doi.org/10.5626/JOK.2025.52.5.395
In information retrieval systems, document re-ranking involves reordering a set of candidate documents based on evaluation of their relevance to a given query. Leveraging extensive natural language understanding capabilities of large language models(LLMs), numerous studies on document re-ranking have been conducted, demonstrating groundbreaking performance. However, studies utilizing large language models focus solely on improving reranking performance, resulting in degraded efficiency due to excessively long input sequences and the need for repetitive inference. To address these limitations, we propose ListT5++, a novel model that represents the relevance between a query and a passage using single token embedding and significantly improves the efficiency of LLM-based reranking through a single-step decoding strategy that minimizes the decoding process. Experimental results showed that ListT5++ could maintain accuracy levels comparable to existing methods while reducing inference latency by a factor of 29.4 relative to the baseline. Moreover, our approach demonstrates robust characteristics by being insensitive to th initial ordering of candidate documents, thereby ensuring high practicality in real-time retrieval environments.
Multi-task Learning Based Re-ranker for External Knowledge Retrieval in Document-grounded Dialogue Systems
http://doi.org/10.5626/JOK.2023.50.7.606
Document-grounded dialogue systems retrieve external passages related to the dialogue and use them to generate an appropriate response to the user"s utterance. However, the retriever based on the dual-encoder architecture records low performance in finding relevant passages, and the re-ranker to complement the retriever is not sufficiently optimized. In this paper, to solve these problems and perform effective external passage retrieval, we propose a re-ranker based on multi-task learning. The proposed model is a cross-encoder structure that simultaneously learns contrastive learning-based ranking, Masked Language Model (MLM), and Posterior Differential Regularization (PDR) in the fine-tuning stage, enhancing language understanding ability and robustness of the model through auxiliary tasks of MLM and PDR. Evaluation results on the Multidoc2dial dataset show that the proposed model outperforms the baseline model in Recall@1, Recall@5, and Recall@10.
Performance Improvement of a Korean Open Domain Q&A System by Applying the Trainable Re-ranking and Response Filtering Model
Hyeonho Shin, Myunghoon Lee, Hong-Woo Chun, Jae-Min Lee, Sung-Pil Choi
http://doi.org/10.5626/JOK.2023.50.3.273
Research on Open Domain Q&A, which can identify answers to user inquiries without preparing the target paragraph in advance, is currently being undertaken as deep learning technology is used for natural language processing. However, existing studies have limitations in semantic matching using keyword-based information retrieval. To supplement this, deep learning-based information retrieval research is in progress. But there are not many domestic studies that have been empirically applied to real systems. In this paper, a two-step performance enhancement method was proposed to improve the performance of the Korean open domain Q&A system. The proposed method is a method of sequentially applying a machine learning-based re-ranking model and a response filtering model to a baseline system in which a search engine and an MRC model was combined. In the case of the baseline system, the initial performance was an F1 score of 74.43 and an EM score of 60.79, and it was confirmed that the performance improved to an F1 score of 82.5 and an EM score of 68.82 when the proposed method was used.
2-Phase Passage Re-ranking Model based on Neural-Symbolic Ranking Models
Yongjin Bae, Hyun Kim, Joon-Ho Lim, Hyun-ki Kim, Kong Joo Lee
http://doi.org/10.5626/JOK.2021.48.5.501
Previous researches related to the QA system have focused on extracting exact answers for the given questions and passages. However, when expanding the problem from machine reading comprehension to open domain question answering, finding the passage containing the correct answer is as important as machine reading comprehension. DrQA reported that Exact Match@Top1 performance decreased from 69.5 to 27.1 when the QA system had the initial search step. In the present work, we have proposed the 2-phase passage reranking model to improve the performance of the question answering system. The proposed model integrates the results of the symbolic and neural ranking models to re-rank them again. The symbolic ranking model was trained based on the CatBoost algorithm and manual features between the question and passage. The neural model was trained based on the KorBERT model by fine-tuning. The second stage model was trained based on the neural regression model. We maximized the performance by combining ranking models with different characters. Finally, the proposed model showed the performance of 85.8% via MRR and 82.2% via BinaryRecall@Top1 measure while evaluating 1,000 questions. Each performance was improved by 17.3%(MRR) and 22.3%(BR@Top1) compared with the baseline model.
Passage Re-ranking Method Based on Sentence Similarity Through Multitask Learning
Youngjin Jang, Hyeon-gu Lee, Jihyun Wang, Chunghee Lee, Harksoo Kim
http://doi.org/10.5626/JOK.2020.47.4.416
The machine reading comprehension(MRC) system is a question answering system in which a computer understands a given passage and respond questions. Recently, with the development of the deep neural network, research on the machine reading system has been actively conducted, and the open domain machine reading system that identifies the correct answer from the results of the information retrieval(IR) model rather than the given passage is in progress. However, if the IR model fails to identify a passage comprising the correct answer, the MRC system cannot respond to the question. That is, the performance of the open domain MRC system depends on the performance of the IR model. Thus, for an open domain MRC system to record high performance, a high performance IR model must be preceded. The previous IR model has been studied through query expansion and reranking. In this paper, we propose a re-ranking method using deep neural networks. The proposed model re-ranks the retrieval results (passages) through multi-task learning-based sentence similarity, and improves the performance by approximately 8% compared to the performance of the existing IR model with experimental results of 58,980 pairs of MRC data.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr