Search : [ keyword: prediction ] (66)

Identification of Generative Adversarial Network Models Suitable for Software Defect Prediction

Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim

http://doi.org/10.5626/JOK.2022.49.1.52

Software Defect Prediction(SDP) helps effectively allocate quality assurance resources which are limited by identifying modules that are likely to cause defects. Software defect data suffer from class imbalance problems in which there are more non-defective instances than defective instances. In most machine learning methods, the defect prediction performance is degraded when there is a disproportionate number of instances belonging to a particular class. Therefore, this research aimed to solve the class imbalance problem and improve defect prediction performance by using a Generative Adversarial Network(GAN) model. To this end, we compared different kinds of GAN models for their suitability for SDP and checked the applicability of GAN models that were not applied in the related work. In our study, Vanilla-GAN(GAN), Conditional GAN (cGAN), and Wasserstein GAN (WGAN) models which were initially proposed for image generation were adapted for software defect prediction. Then those modified models were compared with Tabular GAN(TGAN) and Modeling Tabular data using Conditional GAN(CTGAN). Our experimental results showed that the CTGAN model is suitable for SDP data. We also conducted a sensitivity analysis examining which hyper-parameter values of CTGAN increase the recall rate and lower the probability of false alarm (PF). Our experimental results indicated that the hyper-parameters should be adjusted according to the dataset. We expect that our proposed approach can help effectively allocate limited resources by improving the performance of SDP.

Layer-wise Relevance Propagation (LRP) Based Technical and Macroeconomic Indicator Impact Analysis for an Explainable Deep Learning Model to Predict an Increase and Decrease in KOSPI

Jae-Eung Lee, Ji-Hyeong Han

http://doi.org/10.5626/JOK.2021.48.12.1289

Most of the research on stock prediction using artificial intelligence has focused on improving the accuracy. However, reliability, transparency, and equity of decision-making should be secured in the field of finance. This study proposes a layer-wise relevance propagation (LRP) approach to create an explainable stock prediction deep learning model, which is trained using macroeconomic and technical indicators as the input features. Also, the definition of the problem is simplified by prediction of an increase or decrease in the KOSPI closing price from the previous day instead of prediction of the KOSPI value itself. To show how the proposed method works, experiments are conducted. The results show that the model trained with data by the selected features via LRP is more accurate than the vanilla model. Moreover, we show that LRP results are meaningful by analyzing the tendency of the positive effect of each feature for the prediction results.

Knowledge Graph Completion using Hyper-class Information and Pre-trained Language Model

Daesik Jang, Youngjoong Ko

http://doi.org/10.5626/JOK.2021.48.11.1228

Link prediction is a task that aims to predict missing links in knowledge graphs. Recently, several link prediction models have been proposed to complete the knowledge graphs and have achieved meaningful results. However, the previous models used only the triples" internal information in the training data, which may lead to an overfitting problem. To address this problem, we propose Hyper-class Information and Pre-trained Language Model (HIP) that performs hyper-class prediction and link prediction through a multi-task learning. HIP learns not only contextual relationship of triples but also abstractive meanings of entities. As a result, it learns general information of the entities and forces the entities connected to the same hyper-class to have similar embeddings. Experimental results show significant improvement in Hits@10 and Mean Rank (MR) compared to KG-BERT and MTL-KGC.

Denoising Multivariate Time Series Modeling for Multi-step Time Series Prediction

Jungsoo Hong, Jinuk Park, Jieun Lee, Kyeonghun Kim, Seung-Kyun Hong, Sanghyun Park

http://doi.org/10.5626/JOK.2021.48.8.892

The research field of time series forecasting predicts the future time point using seasonality in time series. In the industrial environment, since decision-making through continuous perspective prediction of the future is important, multi-step time series forecasting is necessary. However, multi-step prediction is highly unstable because of its dependency on predicted value of previous time prediction result. Therefore, the traditional time series forecasting makes a statistical prediction for the single time point. To address this limitation, we propose a novel encoder-decoder based neural network named ‘DTSNet’ which predicts multi-step time points for multivariate time series. To stabilize multi-step prediction, we exploit positional encoding to enhance representation for time point and propose a novel denoising training method. Moreover, we propose dual attention to resolve long-term dependencies and modeling complex patterns in time series, and we adopt multi-head strategy at linear projection layer for variable-specific modeling. To verify the performance improvement of our approach, we compare and analyze it with baseline models, and we demonstrate the proposed methods through comparison tests, such as, component ablation study and denoising degree experiment.

A Selection Technique of Source Project in Heterogeneous Defect Prediction based on Correlation Coefficients

Eunseob Kim, Jongmoon Baik, Duksan Ryu

http://doi.org/10.5626/JOK.2021.48.8.920

The software defect prediction techniques try to predict defect-prone modules and ensure the quality of the developing software using previous defect data. Nowadays, heterogeneous defect prediction (HDP) techniques have been applying defect prediction techniques even when the metrics between source and target projects are different. Previous HDP techniques focused on improving prediction performance when the source and target projects were given. However in a real development environment, more than one source projects exist for one target project, thus identifying a project that is suitable for source data is challenging. This paper suggests a correlation-based selection technique for source projects in HDP. After the metric matching process, correlation coefficients are calculated for each corresponding metric, and the project with the highest score is selected for source data. The experiment shows that the performance of the proposed selection method is higher than the results of random selection, and removing projects with less than 100 instances from the source candidates improves the performance. Therefore, using the proposed selection technique could improve the prediction accuracy in HDP.

Prediction of Fine Dust in Gyeonggi-do Industrial Complex using Machine Learning Methods

Dong-Jun Won, Sun-Kyum Kim, Yeonghun Kim, Gyuwon Song

http://doi.org/10.5626/JOK.2021.48.7.764

Recently, research on fine dust has been conducted through various prediction techniques. However, currently the research focused on PM10 concentration prediction, and thus it is necessary to develop a model capable of predicting PM2.5 concentration. In this paper, we have collected air quality, weather, and traffic of the Banwol Shihwa National Industrial Complex in the recent two years. The significance of the variable been identified through correlation analysis and regression analysis among PM2.5 and PM10, SO₂, NO₂, CO, O₃, temperature, humidity, wind direction, wind speed, precipitation, road section vehicle speed for each vehicle. Next, the data has been used to predict PM2.5 concentration based on time in the industrial complex. Through the artificial intelligence techniques, Random Forest, XGBoost, LightGBM, Deep neural network and Voting models, PM2.5 concentration industrial complexes been predicted on an hourly basis, and comparative analysis been conducted based on RMSE. As a result of prediction, RMSE was 6.27, 6.41, 6.22, 6.64, and 6.12, respectively, and each technique showed very high performance compared to 10.77 of the technique predicted by Air Korea.

A Method for Cancer Prognosis Prediction Using Gene Embedding

Hyunji Kim, Jaegyoon Ahn

http://doi.org/10.5626/JOK.2021.48.7.842

Identifying prognostic genes and using them to predict the prognosis of cancer patients can help provide them with more effective treatments. Many methods have been proposed to identify prognostic genes and predict cancer prognosis, and recent studies have focused on machine learning methods including deep learning. However, applying gene expression data to machine learning methods has the limitations of a small number of samples and a large number of genes. In this study, we additionally use a gene network to generate many random gene paths, which we used for training the model, thereby compensating for the small sample problem. We identified the prognostic genes and predicted the prognosis of patients using the gene expression data and gene networks for five cancer types and confirmed that the proposed method showed better predictive accuracy compared to other existing methods, and good performance on small sample data.

Predicting the Cache Performance Benefits for In-memory Data Analytics Frameworks

Minseop Jeong, Hwansoo Han

http://doi.org/10.5626/JOK.2021.48.5.479

In-memory data analytics frameworks provide intermediate results in caching facilities for performance. For effective caching, the actual performance benefits from cached data should be taken into consideration. As existing frameworks only measure execution times at the distributed task level, they have limitations in predicting the cache performance benefits accurately. In this paper, we propose an operator-level time measurement method, which incorporates the existing task-level execution time measurement with our cost prediction model according to input data sizes. Based on the proposed model and the execution flow of the application, we propose a prediction method for the performance benefits from data caching. Our proposed model provides opportunities for cache optimization with predicted performance benefits. Our cost model for operators showed prediction error rate of 7.3% on average, when measured with 10x input data. The difference between predicted performance and actual performance wes limited to within 24%.

Prediction of Blood Glucose in Diabetic Inpatients Using LSTM Neural Network

Sang Hyeon Kim, Han Beom Lee, Seong Wan Jeon, Dae Yeon Kim, Sang Jeong Lee

http://doi.org/10.5626/JOK.2020.47.12.1120

Diabetes is a chronic disease that causes serious complications, and at the medical site, doctors predict future changes in blood glucose based on patients past blood glucose trends and implement medical treatment. Recently, a CGM(Continuous Glucose Monitoring) measuring device has been introduced that can automatically measure blood glucose every five minutes to monitor continuous changes in blood glucose, and it is widely used in clinical applications. Based on the results of CGM blood glucose, the doctors predict and treat the timing of insulin administration and high risk of diabetes patients. In this paper, the blood glucose prediction model based on deep learning neural network is proposed. The proposed model is designed with an LSTM (Long Short-Term Memory) based neural network. It is designed to take historical blood glucose data as well as variables such as HbA1c(glycated hemoglobin) and BMI(body mass index). It was applied and tested using CGM blood glucose data from Type 2 Diabetes inpatients at a university hospital. The proposed model which patient characteristics show50% improvement at maximum in blood glucose prediction accuracy over the LSTM model of previous study.

An Embedding Method of Emotes for the Detection of Popular Clips on Twitch.tv

Hyeonho Song, Kunwoo Park, Meeyoung Cha

http://doi.org/10.5626/JOK.2020.47.12.1153

This study presents an embedding method that effectively learns emote’s meaning in Twitch.tv to understand the audience reaction in live streaming. The proposed method first trains an embedding matrix for text and emotes, respectively, and merges the two matrices into one. Using 2,220,761 clips shared on Twitch.tv, this study conducted two experiments: clustering and clip popularity prediction. Results showed that the approach identifies emote clusters that express a similar emotion and detects popular clips. Future studies could utilize the proposed emote embedding method for the highlight prediction of a live stream.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr