Journal of KIISE

Search : [ keyword: 언어 모델 ] (33)

LLMEE: Enhancing Explainability and Evaluation of Large Language Models through Visual Token Attribution

Yunsu Kim, Minchan Kim, Jinwoo Choi, Youngseok Hwang, Hyunwoo Park

http://doi.org/10.5626/JOK.2024.51.12.1104

Large Language Models (LLMs) have made significant advancements in Natural Language Processing (NLP) and generative AI. However, their complex structure poses challenges in terms of interpretability and reliability. To address this issue, this study proposed LLMEE, a tool designed to visually explain and evaluate the prediction process of LLMs. LLMEE visually represents the impact of each input token on the output, enhancing model transparency and providing insights into various NLP tasks such as Summarization, Question Answering, Text Generation. Additionally, it integrates evaluation metrics such as ROUGE, BLEU, and BLEURTScore, offering both quantitative and qualitative assessments of LLM outputs. LLMEE is expected to contribute to more reliable evaluation and improvement of LLMs in both academic and industrial contexts by facilitating a better understanding of their complex workings and by providing enhanced output quality assessments.

Improving Conversational Query Rewriting through Generative Coreference Resolution

Heejae Yu, Sang-goo Lee

http://doi.org/10.5626/JOK.2024.51.11.1028

Conversational search enables retrieval of relevant passages for a current turn query by understanding the contextual meaning in a multi-turn dialogue. In conversational search, Conversational Query Reformulation enables utilization of off-the-shelf retrievers by transforming context-dependent queries into self-contained forms. Existing approaches primarily fine-tune pre-trained language models using human-rewritten queries as labels or prompt large language models (LLMs) to address ambiguity inherent in the current turn query, such as ellipsis and coreference. However, our preliminary experimental results indicate that existing models continue to face challenges with coreference resolution. This paper addresses two main research questions: 1) Can a model be trained to distinguish anaphoric mentions that need further clarification? And 2) Can a model be trained to clarify detected coreference mentions into more specified phrases? To investigate these questions, we devised two main components - the detector and the decoder. Our experiments demonstrated that our fine-tuned detector could identify diverse anaphoric phrases within questions, while our fine-tuned decoder could successfully clarify them, ultimately enabling effective coreference resolution for query rewriting. Therefore, we present a novel paradigm, Coreference Aware Conversational Query Reformulation, utilizing these main components.

Improving Retrieval Models through Reinforcement Learning with Feedback

Min-Taek Seo, Joon-Ho Lim, Tae-Hyeong Kim, Hwi-Jung Ryu, Du-Seong Chang, Seung-Hoon Na

http://doi.org/10.5626/JOK.2024.51.10.900

Open-domain question answering involves the process of retrieving clues through search to solve problems. In such tasks, it is crucial that the search model provides appropriate clues, as this directly impacts the final performance. Moreover, information retrieval is an important function frequently used in everyday life. This paper recognizes the significance of these challenges and aims to improve performances of search models. Just as the recent trend involves adjusting outputs in decoder models using Reinforcement Learning from Human Feedback (RLHF), this study seeks to enhance search models through the use of reinforcement learning. Specifically, we defined two rewards: the loss of the answer model and the similarity between the retrieved documents and the correct document. Based on these, we applied reinforcement learning to adjust the probability score of the top-ranked document in the search model's document probability distribution. Through this approach, we confirmed the generality of the reinforcement learning method and its potential for further performance improvements.

Generating Relation Descriptions with Large Language Model for Link Prediction

Hyunmook Cha, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.10.908

The Knowledge Graph is a network consisting of entities and the relations between them. It is used for various natural language processing tasks. One specific task related to the Knowledge Graph is Knowledge Graph Completion, which involves reasoning with known facts in the graph and automatically inferring missing links. In order to tackle this task, studies have been conducted on both link prediction and relation prediction. Recently, there has been significant interest in a dual-encoder architecture that utilizes textual information. However, the dataset for link prediction only provides descriptions for entities, not for relations. As a result, the model heavily relies on descriptions for entities. To address this issue, we utilized a large language model called GPT-3.5-turbo to generate relation descriptions. This allows the baseline model to be trained with more comprehensive relation information. Moreover, the relation descriptions generated by our proposed method are expected to improve the performance of other language model-based link prediction models. The evaluation results for link prediction demonstrate that our proposed method outperforms the baseline model on various datasets, including Korean ConceptNet, WN18RR, FB15k-237, and YAGO3-10. Specifically, we observed improvements of 0.34%p, 0.11%p, 0.12%p, and 0.41%p in terms of Mean Reciprocal Rank (MRR), respecitvely.

A Comparative Study on Server Allocation Optimization Algorithms for Accelerating Parallel Training of Large Language Models

Jinkyu Yim, Yerim Choi, Jinho Lee

http://doi.org/10.5626/JOK.2024.51.9.783

As large-scale language models (LLMs) come to be increasingly utilized in various fields, there is an increasing demand to develop models with higher performance. Significant computational power and memory capacity will be needed to train such models. Therefore, researchers have used 3D parallelization methodology for large-scale language model learning on numerous servers equipped with GPUs. However, 3D parallelization requires frequent large-scale data transfers between servers, which bottlenecks the overall training time. To address this, prior studies have proposed a methodology that identifies non-uniform cluster network conditions in advance and arranges servers and GPUs in an optimized parallel configuration. The existing methods of this type use the classical optimization algorithm SA (Simulated Annealing) for mapping. In this paper, we apply genetic algorithms as well as SAT(satisfiability) algorithms to the problem, and compare and analyze the performance of each algorithm under various experimental environments.

In-Depth Evaluations of the Primality Testing Capabilities of Large Language Models: with a Focus on ChatGPT and PaLM 2

Hyeonwoo Jung, Kunwoo Park

http://doi.org/10.5626/JOK.2024.51.8.699

This study aims to thoroughly evaluate the primality testing capabilities of two large language models, ChatGPT and PaLM 2. We pose two different yes/no questions for a given number, assessing whether it is prime or composite. To deem a model successful, it must correctly answer both questions while also avoiding any division errors in the generated prompt. Analyzing the inference results using a dataset consisting of 664 prime and 1458 composite numbers, we discovered a decrease in testing accuracy as the difficulty of the target numbers increased. Considering the calculation errors, both models experienced a decrease in testing accuracy, with PaLM 2 failing to conduct primality testing for all composite numbers with four high-difficulty digits. These findings highlight the potential for misleading evaluations of language models' reasoning abilities based on simple questions, emphasizing the need for comprehensive assessments.

Adaptation of A Hierarchical Cumulative Prompting with Generative Large-scale Language Models in the Legal Domain

Yeenheui Yeen, HaeIn Jung, MinJu Kim, Jeong Yang, Minhye Kim, Hyunji Jang, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2024.51.7.592

This study introduces a stepwise hierarchical prompting method suitable for large-scale generative language models in complex legal reasoning tasks. Complex logical problems are decomposed into multiple steps, accumulating results from each step to set prompts for subsequent ones. It was confirmed that when this method was applied to the evaluation process of the Korean bar exam's essay-type questions, it achieved better results than fine-tuning with original data. Notably, in the final evaluation by legal experts, both tasks showed a human precision of over 0.70, indicating its capability to produce interpretations based on accurate evidence. This prompting technique suggests a solution to the hallucination issue in large language models and demonstrates its effective application. Future research will consider the introduction of a specialized retriever to reflect more accurate legal knowledge in the large language model, aiming to incorporate more precise evidence into prompts. While the current research applied the prompting method only to the legal field, it is expected to be applicable to other complex logical reasoning tasks that rely on specialized knowledge.

Pseudo-label Correction using Large Vision-Language Models for Enhanced Domain-adaptive Semantic Segmentation

Jeongkee Lim, Yusung Kim

http://doi.org/10.5626/JOK.2024.51.5.464

It is very expensive to make semantic segmentation labels for real-world images. To solve this problem in unsupervised domain adaptation, the model is trained by using data generated in a virtual environment that can easily collect labels or data is already collected and real-world images without labels. One of the common problems in unsupervised domain adaptation is that thing classes with similar appearance are easily confused. In this paper, we propose a method of calibrating the label of the number of target data using large vision-language models. Making the number of labels generated for the target image more accurate can reduce confusion among thing classes. The proposed method improves the performance of DAFormer by +1.1 mIoU in adaptation from game to reality and +1.1 mIoU in adaptation from day to night. For thing classes, the proposed method improved the performance of the MIC by +0.6 mIoU in adaptation from game to reality and +0.7 mIoU in adaptation from day to night.

New Transformer Model to Generate Molecules for Drug Discovery

Yu-Bin Hong, Kyungjun Lee, DongNyenog Heo, Heeyoul Choi

http://doi.org/10.5626/JOK.2023.50.11.976

Among various generative models, recurrent neural networks (RNNs) based models have achieved state-of-the-art performance in the drug generation task. To overcome the long-term dependency problem that RNNs suffer from, Transformer-based models were proposed for the task. However, the Transformer models showed worse performances than the RNNs models in the drug generation task, and we believe it was because the Transformer models were over-parameterized with the over-fitting problem. To avoid the problem, in this paper, we propose a new Transformer model by replacing the large decoder with simple feed-forward layers. Experiments confirmed that our proposed model outperformed the previous state-of-the-art baseline in major evaluation metrics while preserving other minor metrics with a similar level of performance. Furthermore, when we applied our model to generate candidate molecules against SARs-CoV-2 (COVID-19) virus, the generated molecules were more effective than drugs in commercial market such as Paxlovid, Molnupiravir, and Remdesivir.

Multi-task Learning Based Re-ranker for External Knowledge Retrieval in Document-grounded Dialogue Systems

Honghee Lee, Youngjoong Ko

http://doi.org/10.5626/JOK.2023.50.7.606

Document-grounded dialogue systems retrieve external passages related to the dialogue and use them to generate an appropriate response to the user"s utterance. However, the retriever based on the dual-encoder architecture records low performance in finding relevant passages, and the re-ranker to complement the retriever is not sufficiently optimized. In this paper, to solve these problems and perform effective external passage retrieval, we propose a re-ranker based on multi-task learning. The proposed model is a cross-encoder structure that simultaneously learns contrastive learning-based ranking, Masked Language Model (MLM), and Posterior Differential Regularization (PDR) in the fine-tuning stage, enhancing language understanding ability and robustness of the model through auxiliary tasks of MLM and PDR. Evaluation results on the Multidoc2dial dataset show that the proposed model outperforms the baseline model in Recall@1, Recall@5, and Recall@10.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Digital Library[ Search Result ]

Search

Editorial Office