Search : [ keyword: 정보검색 ] (4)

Information Retrieval-based Bug Localization for Korean Bug Reports using Translation

Misoo Kim

http://doi.org/10.5626/JOK.2024.51.9.827

Information retrieval-based bug localization technique uses bug reports as queries to automatically identify faulty source files, significantly reducing the time developers spend locating bugs. The core of this technique lies in calculating text similarity between bug reports and source files. However, for bug reports written in Korean, the text similarity might not be effective due to difficulty of matching words with source codes primarily written in English. This study proposed an information retrieval-based bug localization technique for Korean bug reports using translation, enabling Korean developers to effectively use this technique. We also applied a soft voting method to effectively leverage outputs of multiple translators. To validate the performance of the proposed technique, we collected 269 Korean bug reports and conducted experiments using three translators and two ranking models. Experimental results showed that the proposed method improved bug localization performance by 44% compared to baselines.

2-Phase Passage Re-ranking Model based on Neural-Symbolic Ranking Models

Yongjin Bae, Hyun Kim, Joon-Ho Lim, Hyun-ki Kim, Kong Joo Lee

http://doi.org/10.5626/JOK.2021.48.5.501

Previous researches related to the QA system have focused on extracting exact answers for the given questions and passages. However, when expanding the problem from machine reading comprehension to open domain question answering, finding the passage containing the correct answer is as important as machine reading comprehension. DrQA reported that Exact Match@Top1 performance decreased from 69.5 to 27.1 when the QA system had the initial search step. In the present work, we have proposed the 2-phase passage reranking model to improve the performance of the question answering system. The proposed model integrates the results of the symbolic and neural ranking models to re-rank them again. The symbolic ranking model was trained based on the CatBoost algorithm and manual features between the question and passage. The neural model was trained based on the KorBERT model by fine-tuning. The second stage model was trained based on the neural regression model. We maximized the performance by combining ranking models with different characters. Finally, the proposed model showed the performance of 85.8% via MRR and 82.2% via BinaryRecall@Top1 measure while evaluating 1,000 questions. Each performance was improved by 17.3%(MRR) and 22.3%(BR@Top1) compared with the baseline model.

Bug Report Quality Prediction for Enhancing Performance of Information Retrieval-based Bug Localization

Misoo Kim, June Ahn, Eunseok Lee

http://doi.org/10.5626/JOK.2017.44.8.832

Bug reports are essential documents for developers to localize and fix bugs. These reports contain information regarding software bugs or failures that occur during software operation and maintenance phase. Information Retrieval-based Bug Localization (IR-BL) techniques have been proposed to reduce the time and cost it takes for developers to resolve bug reports. However, if a low-quality bug report is submitted, the performance of such techniques can be significantly degraded. To address this problem, we propose a quality prediction method that selects low-quality bug reports. This process; defines a Quality property of a Bug report as a Query (Q4BaQ) and predicts the quality of the bug reports using machine learning. We evaluated the proposed method with 3 open source projects. The results of the experiment show that the proposed method achieved an average F-measure of 87.31% and outperformed previous prediction techniques by up to 6.62% in the F-measure. Finally, a combination of the proposed method and traditional automatic query reformulation method improved the MRR and MAP by 0.9% and 1.3%, respectively.

A Semi-automatic Construction method of a Named Entity Dictionary Based on Wikipedia

Yeongkil Song, Seokwon Jeong, Harksoo Kim

http://doi.org/

A named entity(NE) dictionary is an important resource for the performance of NE recognition. However, it is not easy to construct a NE dictionary manually since human annotation is time consuming and labor-intensive. To save construction time and reduce human labor, we propose a semi-automatic system for the construction of a NE dictionary. The proposed system constructs a pseudo-document with Wiki-categories per NE class by using an active learning technique. Then, it calculates similarities between Wiki entries and pseudo-documents using the BM25 model, a well-known information retrieval model. Finally, it classifies each Wiki entry into NE classes based on similarities. In experiments with three different types of NE class sets, the proposed system showed high performance(macro-average F1-score of 0.9028 and micro-average F1-score 0.9554).


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr