Search : [ keyword: 유사 텍스트 분석 ] (1)

Study on the Evaluation of Embedding Models in the Natural Language Processing

Hanhoon Kang

http://doi.org/10.5626/JOK.2025.52.2.141

This paper applies embedding techniques to key tasks in the field of Natural Language Processing (NLP), including semantic textual search, text classification, question answering, and clustering, and evaluates their performance. Recently, with the advancement of large-scale language models, embedding technologies have played a crucial role in various NLP applications. Several types of embedding models have been publicly released, and this paper assesses the performance of these models. For this evaluation, vector representations generated by embedding models were used as an intermediate step for each selected task. The experiments utilized publicly available Korean and English datasets, and five NLP tasks were defined. Notably, the BGE-M3 model, which demonstrated exceptional performance in multilingual, cross-lingual, and long-document retrieval tasks, was a key focus of this study. The experimental results show that the BGE-M3 model outperforms other models in three of the evaluated NLP tasks. The findings of this research are expected to provide guidance in selecting embedding models for identifying similar sentences or documents in recent Retrieval-Augmented Generation (RAG) applications.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr