Journal of KIISE

Search : [ keyword: 클래스 불균형 ] (5)

Images of the theme of a drama are typically zoomed-in mainly to people. As a result, people-oriented images are predominant in drama data, and class imbalance naturally occurs. This paper addresses the issue of class imbalance in drama data for object detection tasks and proposes various sampling methods to tackle this challenge within the framework of semi-supervised learning. Experimental evaluations demonstrated that the suggested semi-supervised learning approach with specialized sampling methods outperformed traditional supervised and semi-supervised methods. This study underscores the significance of selecting appropriate training data and sampling methods to optimize object detection performance in specialized datasets with unique characteristics.

Identification of Generative Adversarial Network Models Suitable for Software Defect Prediction

Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim

http://doi.org/10.5626/JOK.2022.49.1.52

Software Defect Prediction(SDP) helps effectively allocate quality assurance resources which are limited by identifying modules that are likely to cause defects. Software defect data suffer from class imbalance problems in which there are more non-defective instances than defective instances. In most machine learning methods, the defect prediction performance is degraded when there is a disproportionate number of instances belonging to a particular class. Therefore, this research aimed to solve the class imbalance problem and improve defect prediction performance by using a Generative Adversarial Network(GAN) model. To this end, we compared different kinds of GAN models for their suitability for SDP and checked the applicability of GAN models that were not applied in the related work. In our study, Vanilla-GAN(GAN), Conditional GAN (cGAN), and Wasserstein GAN (WGAN) models which were initially proposed for image generation were adapted for software defect prediction. Then those modified models were compared with Tabular GAN(TGAN) and Modeling Tabular data using Conditional GAN(CTGAN). Our experimental results showed that the CTGAN model is suitable for SDP data. We also conducted a sensitivity analysis examining which hyper-parameter values of CTGAN increase the recall rate and lower the probability of false alarm (PF). Our experimental results indicated that the hyper-parameters should be adjusted according to the dataset. We expect that our proposed approach can help effectively allocate limited resources by improving the performance of SDP.

Improved Prediction for Configuration Bug Report Using Text Mining and Dimensionality Reduction

Jeongwhan Choi, Jiwon Choi, Duksan Ryu, Suntae Kim

http://doi.org/10.5626/JOK.2021.48.1.35

Configuration bugs are one of the main causes of software failure. Software organizations collect and manage bug reports using an issue tracking system. The bug assignor can spend excessive amounts of time identifying whether a bug is a configuration bug or not. Configuration bug prediction can help the bug assignor reduce classification efforts and aid decision making. In this paper, we propose an improved classification model using text mining and dimensionality reduction. This paper extracts 4,457 bug reports from six open-source software projects, trains a model to classify configuration bug reports, and evaluates prediction performance. The best performance method is obtained using the k-Nearest Neighbors model with the SMOTEENN sampling technique after extracting the feature with Bag of Words and then reducing the dimension of the feature using Linear Discriminant Analysis. The results show that ROC-AUC is 0.9812 and MCC is 0.942. This indicates better performance than Xia et al."s method and solves the class imbalance problem of our previous study. By predicting these enhanced configuration bug reports, our proposed approach can provide the bug assignors with information they need to make informed decisions.

CNN-based Reduced Complexity Decision Confidence Estimation for Imbalanced Web Application Attack Detection

Seungyoung Park, Hansung Kim, Taejoon Jung

http://doi.org/10.5626/JOK.2020.47.9.842

As web application attacks have been rapidly increasing and their types have been diversified, there are limitations on detecting them with the existing schemes. To resolve this problem, the detection techniques using machine learning such as the convolutional neural network (CNN) have been proposed. However, the confidence on the decision error sample in these techniques has been unreliable. To estimate more reliable decision confidence, the Monte-Carlo batch normalization (MCBN) technique combined with the CNN has been proposed. In particular, the CNN performs multiple decisions on a given evaluation sample using multiple mini-batches containing it. Then, its decision confidence estimate is obtained by averaging the multiple decision results. However, it requires too large of a computational load. The reason is that each mini-batch comprises randomly selected (M-1) training samples and only one evaluation sample, when the mini-batch size is M. In this paper, we propose a reduced complexity decision confidence estimation scheme for imbalanced web application attack detection. Specifically, the proposed scheme reduces the computational load by up to M times compared to the MCBN scheme. Also, at the estimation process, the ratio of normal and attack samples in the mini-batch should be maintained the same as that of the training process. To achieve this, we found which class size was small by performing a temporal decision on the evaluation samples. Then, the small class was over-sampled using the training samples to maintain the ratio. Our experimental results showed that the performance improved, and the reliability estimation performance was not significantly degraded compared to the MCBN scheme.

A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification

Suin Seo, Sung-Bae Cho

http://doi.org/10.5626/JOK.2017.44.12.1275

The supervised learning approach is suitable for classification of insulting sentences, but pre-decided training sentences are necessary. Since a Character-level Convolution Neural Network is robust for each character, so is appropriate for classifying abusive sentences, however, has a drawback that demanding a lot of training sentences. In this paper, we propose transfer learning method that reusing the trained filters in the real classification process after the filters get the characteristics of offensive words by generated abusive/normal pair of sentences. We got higher performances of the classifier by decreasing the effects of data shortage and class imbalance. We executed experiments and evaluations for three datasets and got higher F1-score of character-level CNN classifier when applying transfer learning in all datasets.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Semi-Supervised Object Detection for Small Imbalanced Drama Dataset

Identification of Generative Adversarial Network Models Suitable for Software Defect Prediction

Improved Prediction for Configuration Bug Report Using Text Mining and Dimensionality Reduction

CNN-based Reduced Complexity Decision Confidence Estimation for Imbalanced Web Application Attack Detection

A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification

Search

Editorial Office