Fine-Tuning BGE-M3 for Defense Language EmbeddingModel: The Impact of Negative Sample Selection inContrastive Learning

Junsub Kim; Dongnyeok Choi; Sung Gu Kim; Deuk Hwa Kim

Fine-Tuning BGE-M3 for Defense Language EmbeddingModel: The Impact of Negative Sample Selection inContrastive Learning

Vol. 53, No. 2, pp. 117-123, Feb. 2026

10.5626/JOK.2026.53.2.117

PDF

Abstract

Korean language models specifically designed for the defense sector are still limited, even with the rapid advancements in text embeddings. In this study, we fine-tune the multilingual BGE-M3 model to better understand military terminology and investigate how negative sampling in contrastive learning impacts downstream performance. We evaluate three strategies: Easy (random negatives), Hard (lexicographic adjacency), and Harder (similarity-mined negatives). Our analysis, based on clustering metrics such as Accuracy, NMI, and ARI using a defense news dataset, reveals that the similarity-based Harder strategy consistently outperforms the others. Further evaluations on the KorSTS dataset demonstrate that the Harder approach maintains strong Spearman and Pearson correlations, indicating successful domain adaptation without compromising overall semantic competence. Interestingly, the three Harder variants—negatives mined with BGE-M3, ko-sroberta, and multilingual-e5—produce nearly identical similarity distributions and comparable improvements, while the Easy strategy plateaus and the Hard strategy shows only moderate performance. These findings suggest that mining sufficiently similar negatives, as opposed to using random or adjacent ones, is crucial for effective, domain-specific fine-tuning of multilingual embedding models.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

J. Kim, D. Choi, S. G. Kim, D. H. Kim, "Fine-Tuning BGE-M3 for Defense Language EmbeddingModel: The Impact of Negative Sample Selection inContrastive Learning," Journal of KIISE, JOK, vol. 53, no. 2, pp. 117-123, 2026. DOI: 10.5626/JOK.2026.53.2.117.

[ACM Style]

Junsub Kim, Dongnyeok Choi, Sung Gu Kim, and Deuk Hwa Kim. 2026. Fine-Tuning BGE-M3 for Defense Language EmbeddingModel: The Impact of Negative Sample Selection inContrastive Learning. Journal of KIISE, JOK, 53, 2, (2026), 117-123. DOI: 10.5626/JOK.2026.53.2.117.

[KCI Style]

김준섭, 최동녘, 김성구, 김득화, "국방 언어 임베딩 모델을 위한 BGE-M3의미세조정: 대조학습에서 네거티브 샘플 선택이모델 성능에 미치는 영향," 한국정보과학회 논문지, 제53권, 제2호, 117~123쪽, 2026. DOI: 10.5626/JOK.2026.53.2.117.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr