Digital Library[ Search Result ]
Improved Software Defect Prediction with Gated Tab Transformer
Saranya Manikandan, Duksan Ryu
http://doi.org/10.5626/JOK.2025.52.3.196
Software Defect Prediction (SDP) plays a crucial role in ensuring software quality and reliability. Although, traditional machine learning and deep learning models are widely used for SDP, recent advancements in the field of natural language processing have paved the way for applying transformer-based models in software engineering tasks. This paper investigated transformer-based model as a potential approach to improve SDP model quality, ultimately aiming to enhance software quality and optimize testing resource allocation. Inspired by the Gated Tab Transformer’s (GTT) ability to effectively model relationship within features, we evaluated its effectiveness in SDP. We conducted experiments using 15 software defect datasets and compared results with other state-of-the-art machine learning and deep learning models. Our experiments showed that GTT outperformed state-of-the-art machine learning models in terms of recall, balance, and AUC (increase by 42.1%, 10.93%, and 7.1%, respectively). Cohen's d confirmed this advantage with large and medium effect sizes for GTT on these metrics. Additionally, an ablation study assessed the impact of hyperparameter variations on performance. Thus, GTT's effectiveness address the challenges of SDP, potentially leading to more effective testing resource allocation and improved software quality.
Cross-Project Defect Prediction for Ansible Projects
Sungu Lee, Sunjae Kwon, Duksan Ryu, Jongmoon Baik
http://doi.org/10.5626/JOK.2024.51.3.229
Infrastructure-as-Code (IaC) refers to the activities of automating overall management through code, such as creating and deploying infrastructure. Infrastructure-as-Code is used by many companies due to its efficiency, and many within-project defect prediction techniques have been proposed targeting Ansible, one of the IaC tools. Recently, a study on the applicability of Ansible"s cross-project defect prediction has been proposed. Therefore, Ansible’s cross-project defect prediction technique was used in this study, and its effectiveness was analyzed. Experimental results showed that the performance of the F1-based cross-project defect prediction was measured to be 0.3 to 0.5, and that it could be used as an alternative to the internal project defect prediction technique. It is therefore anticipated that this will be put to use in support of Ansible’s software quality assurance activities.
An Empirical Study on Defects in Open Source Artificial Intelligence Applications
Yoon Ho Choi, Changgong Lee, Jaechang Nam
http://doi.org/10.5626/JOK.2022.49.8.633
The differences between the programming paradigm of applications using artificial intelligence (AI) and traditional applications may show different results in detecting, understanding, analyzing, and fixing defects. In this study, we collect defects that have been reported in open source AI applications and identify common causes of the defects to understand and analyze them in AI-based systems. To this end, we analyze the defects of ten open-source AI applications archived on GitHub by inspecting 1,205 issues and defect-fixing code changes that had been reported, found, and fixed. We classified the defects into 20 categories based on their causes, which are found in at least five out of ten projects. We expect that the result of this study will provide useful information in software quality assurance approaches such as fault localization and patch suggestion.
Identification of Generative Adversarial Network Models Suitable for Software Defect Prediction
Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim
http://doi.org/10.5626/JOK.2022.49.1.52
Software Defect Prediction(SDP) helps effectively allocate quality assurance resources which are limited by identifying modules that are likely to cause defects. Software defect data suffer from class imbalance problems in which there are more non-defective instances than defective instances. In most machine learning methods, the defect prediction performance is degraded when there is a disproportionate number of instances belonging to a particular class. Therefore, this research aimed to solve the class imbalance problem and improve defect prediction performance by using a Generative Adversarial Network(GAN) model. To this end, we compared different kinds of GAN models for their suitability for SDP and checked the applicability of GAN models that were not applied in the related work. In our study, Vanilla-GAN(GAN), Conditional GAN (cGAN), and Wasserstein GAN (WGAN) models which were initially proposed for image generation were adapted for software defect prediction. Then those modified models were compared with Tabular GAN(TGAN) and Modeling Tabular data using Conditional GAN(CTGAN). Our experimental results showed that the CTGAN model is suitable for SDP data. We also conducted a sensitivity analysis examining which hyper-parameter values of CTGAN increase the recall rate and lower the probability of false alarm (PF). Our experimental results indicated that the hyper-parameters should be adjusted according to the dataset. We expect that our proposed approach can help effectively allocate limited resources by improving the performance of SDP.
A Selection Technique of Source Project in Heterogeneous Defect Prediction based on Correlation Coefficients
Eunseob Kim, Jongmoon Baik, Duksan Ryu
http://doi.org/10.5626/JOK.2021.48.8.920
The software defect prediction techniques try to predict defect-prone modules and ensure the quality of the developing software using previous defect data. Nowadays, heterogeneous defect prediction (HDP) techniques have been applying defect prediction techniques even when the metrics between source and target projects are different. Previous HDP techniques focused on improving prediction performance when the source and target projects were given. However in a real development environment, more than one source projects exist for one target project, thus identifying a project that is suitable for source data is challenging. This paper suggests a correlation-based selection technique for source projects in HDP. After the metric matching process, correlation coefficients are calculated for each corresponding metric, and the project with the highest score is selected for source data. The experiment shows that the performance of the proposed selection method is higher than the results of random selection, and removing projects with less than 100 instances from the source candidates improves the performance. Therefore, using the proposed selection technique could improve the prediction accuracy in HDP.
A Case Study of Industrial Software Defect Prediction in Maritime and Ocean Transportation Industries
Jonggu Kang, Duksan Ryu, Jongmoon Baik
http://doi.org/10.5626/JOK.2020.47.8.769
Software defect prediction is a field of study that predicts defects in newly developed software in advance of use, based on models trained with past software defects and software update information using various latest machine learning techniques. It can provide a guide to effectively operate and deploy software quality assurance (SQA) resources in industry practices. Recently, there have been papers that have investigated the industrial application of software defect prediction, but more active research is needed to analyze how this can be applied over diverse domains with different characteristics. In this paper, we present the possibility of applying software defect prediction in the maritime and ocean transportation industries. These are facing challenges to build and deploy the types of emerging transportations such as high-efficiency eco-friendly ships, connected ships, smart ships, unmanned ships, or autonomous ships. In our experiments using actual data collected from the domain, the software defect prediction showed high defect prediction performance with 0.91 accuracy and 0.831 f-measure. This suggests that software defect prediction can be a useful tool to allocate SQA resources effectively in this field.
An Effective Comparative Framework for Cross-Project Defect Prediction Based on the Feature Selection Technique
http://doi.org/10.5626/JOK.2018.45.7.635
Software defect prediction (SDP) can help optimally allocate software testing resources on fault-prone modules. Typically, local data within a company are used to build classifiers. Unlike such Within-Project Defect Prediction (WPDP), there may exist some cases, e.g., pilot projects, without any collected data from historical projects. Cross-project defect prediction (CPDP) using data from other projects can be employed in such cases. The defect prediction performance may be degraded in the presence of irrelevant or redundant information. To address this issue, various feature selection techniques have been suggested. Until now, there has been no research on identifying effective feature selection techniques for CPDP. We present a comparative framework using feature selection to produce a high performance for CPDP. We compare eight existing feature selection techniques, for three CPDP and one WPDP model, based on feature subset evaluators and feature ranking methods. After the features are chosen that perform the best, classifiers are built, tested, and evaluated using the statistical significance and effect size tests. Hybrid Instance Selection using Nearest-Neighbor (HISNN) is better than the other CPDP models and comparable to the WPDP model. Results from the comparison show that a different distribution, class imbalance and feature selection should be considered to obtain a high performance CPDP model.
A Method to Establish Severity Weight of Defect Factors for Application Software using ANP
In order to improve software quality, it is necessary to efficiently and effectively remove software defects in source codes. In the development field, defects are removed according to removal ratio or severity of defects. There are several studies on the removal of defects based on software quality attributes, and several other studies have been done to improve the software quality using classification of the severity of defects, when working on projects. These studies have thus far been insufficient in terms of identifying if there exists relationships between defects or whether any type of defect is more important than others. Therefore, in this study, we collected various types of software defects, standards organization, companies, and researchers. We modeled the defects types using an ANP model, and developed the weighted severities of the defects types, with respect to the general application software, using the ANP model. When general application software is developed, we will be able to use the weight for each severity of defect type, and we expect to be able to remove defects efficiently and effectively.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr