Digital Library[ Search Result ]
Improved Prediction for Configuration Bug Report Using Text Mining and Dimensionality Reduction
Jeongwhan Choi, Jiwon Choi, Duksan Ryu, Suntae Kim
http://doi.org/10.5626/JOK.2021.48.1.35
Configuration bugs are one of the main causes of software failure. Software organizations collect and manage bug reports using an issue tracking system. The bug assignor can spend excessive amounts of time identifying whether a bug is a configuration bug or not. Configuration bug prediction can help the bug assignor reduce classification efforts and aid decision making. In this paper, we propose an improved classification model using text mining and dimensionality reduction. This paper extracts 4,457 bug reports from six open-source software projects, trains a model to classify configuration bug reports, and evaluates prediction performance. The best performance method is obtained using the k-Nearest Neighbors model with the SMOTEENN sampling technique after extracting the feature with Bag of Words and then reducing the dimension of the feature using Linear Discriminant Analysis. The results show that ROC-AUC is 0.9812 and MCC is 0.942. This indicates better performance than Xia et al."s method and solves the class imbalance problem of our previous study. By predicting these enhanced configuration bug reports, our proposed approach can provide the bug assignors with information they need to make informed decisions.
A Method for Training Data Selection based on LSTRf
Myunggwon Hwang, Yuna Jeong, Wonkyung Sung
http://doi.org/10.5626/JOK.2020.47.12.1192
This paper presents a data selection method that has a positive effect on learning for an efficient human-in-the-loop (HITL) process required for automated and intelligent artificial intelligence (AI) development. Our method first maps the training data onto a 2D distribution based on similarity, and then grids are laid out with a fixed ratio. By applying Least Slack Time Rate first (LSTRf) techniques, the data are selected based on the distribution consistency of the same class data within each grid. The finally selected data are used as convolutional neural network (CNN)-based classifiers to evaluate the performance. We carried out experiments on the CIFAR-10 dataset, and evaluated the effect of grid size and the number of data selected in one operation. The selected training data were compared to randomly selected data of the same size. The results verified that the smaller the grid size (0.008 and 0.005) and the greater the number selected in the single operation, the better the learning performance.
Sensor Selection Strategies for Activity Recognition in a Smart Environment
The recent emergence of smart phones, wearable devices, and even the IoT concept made it possible for various objects to interact one another anytime and anywhere. Among many of such smart services, a smart home service typically requires a large number of sensors to recognize the residents’ activities. For this reason, the ideas on activity recognition using the data obtained from those sensors are actively discussed and studied these days. Furthermore, plenty of sensors are installed in order to recognize activities and analyze their patterns via data mining techniques. However, if many of these sensors should be installed for IoT smart home service, it raises the issue of cost and energy consumption. In this paper, we proposed a new method for reducing the number of sensors for activity recognition in a smart environment, which utilizes the principal component analysis and clustering techniques, and also show the effect of improvement in terms of the activity recognition by the proposed method.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr