Search : [ keyword: 텍스트 유사도 ] (2)

Information Retrieval-based Bug Localization for Korean Bug Reports using Translation

Misoo Kim

http://doi.org/10.5626/JOK.2024.51.9.827

Information retrieval-based bug localization technique uses bug reports as queries to automatically identify faulty source files, significantly reducing the time developers spend locating bugs. The core of this technique lies in calculating text similarity between bug reports and source files. However, for bug reports written in Korean, the text similarity might not be effective due to difficulty of matching words with source codes primarily written in English. This study proposed an information retrieval-based bug localization technique for Korean bug reports using translation, enabling Korean developers to effectively use this technique. We also applied a soft voting method to effectively leverage outputs of multiple translators. To validate the performance of the proposed technique, we collected 269 Korean bug reports and conducted experiments using three translators and two ranking models. Experimental results showed that the proposed method improved bug localization performance by 44% compared to baselines.

Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation

Seonghwan Choi, Donghyun Son, Hochang Lee

http://doi.org/10.5626/JOK.2021.48.11.1184

Deriving the similarity between two documents, such as, news articles, is one of the most important factors of clustering documents. Sequence similarity models, one of the existing deep-learning based approaches to document clustering, do not reflect the entire context of documents. To address this issue, this paper uses interaction-based and graph-based approaches to construct document pair similarity models suitable for news clustering. This paper proposes four interaction-based models that measures the similarity between two documents through the aggregation of similarity information in the interaction of sentences. The experimental results demonstrated that two out of these four proposed models outperformed SVM and HAN. Ablation studies were conducted on the graph-based model through experiments on the depth of the model’s neural network and its input features. Through error analysis and ensemble of models with an interaction and graph-based approach, this paper showed that these two approaches could be complementarity due to the differences in their prediction tendencies.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr