Search : [ keyword: prediction ] (66)

Applying Deep Neural Networks and Random Forests to Predict the Pathogenicity of Single Nucleotide Variants in Hereditary Cancer-associated Genes

Da-Bin Lee, Seonhwa Kim, Moonjong Kang, Changbum Hong, Kyu-Baek Hwang

http://doi.org/10.5626/JOK.2023.50.9.746

The recent proliferation of genetic testing has made it possible to explore an individual"s genetic variants and use pathogenicity information to diagnose and prevent genetic diseases. However, the number of identified variants with pathogenicity information is quite small. A method for predicting the pathogenicity of variants by machine learning was proposed to address this problem. In this study, we apply and compare deep neural networks with random forests and logistic regression, which have been widely used in previous studies, to predict variant pathogenicity. The experimental data consisted of 1,068 single-nucleotide variants in genes associated with hereditary cancers. Experiments on 100 random data-sets generated for hyperparameter selection showed that random forests performed best in terms of area under the precision-recall curve. On 15 holdout gene data-sets, deep neural networks performed best on average, but the difference in performance from the second-best random forest was not significant. Logistic regression was also statistically significantly worse than that of either model. In conclusion, we found that deep neural networks and random forests were generally better than logistic regression at predicting the pathogenicity of single-nucleotide variants associated with hereditary cancer.

Analysis of Adversarial Learning-Based Deep Domain Adaptation for Cross-Version Defect Prediction

Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim

http://doi.org/10.5626/JOK.2023.50.6.460

Software defect prediction is a helpful technique for effective testing resource allocation. Software cross-version defect prediction reflects the environment in which the software is developed in a continuous version, with software modules added or deleted through a version update process. Repetition of this process can cause differences in data distribution between versions, which can negatively affect defect prediction performance. Deep domain adaptation(DeepDA) techniques are methods used to reduce distribution difference between sources and target data in the field of computer vision. This paper aims to reduce difference in data distribution between versions using various DeepDA techniques and to identify techniques with the best defect prediction performance. We compared performance between deep domain adaptation techniques (i.e., Domain-Adversarial Neural Network (DANN), Adversarial Discriminator Domain Apaptation (ADDA), and Wasserstein Distance Guided Representation Learning (WDGRL)) and identified performance differences according to the pair of source data. We also checked performance difference according to the ratio of target data used in the learning process and performance difference in terms of hyperparameter setting of the DANN model. Experimental results showed that DANN was more suitable for cross-version defect prediction environments. The DANN model performed the best when using all previous versions of data except the target version as a source. In particular, it showed the best performance when setting the number of hidden layers of the DANN model to 3. In addition, when applying the DeepDA technique, the more target data used in the learning process, the better the performance. This study suggests that various DeepDA techniques can be used to predict software cross-version defects in the future.

Document-level Machine Translation Data Augmentation Using a Cluster Algorithm and NSP

Dokyoung Kim, Changki Lee

http://doi.org/10.5626/JOK.2023.50.5.401

In recent years, research on document level machine translation has been actively conducted to understand the context of the entire document and perform natural translation. Similar to the sentence-level machine translation model, a large amount of training data is required for training of the document-level machine translation model, but there is great difficulty in building a large amount of document-level parallel corpus. Therefore, in this paper, we propose a data augmentation technique effective for document-level machine translation in order to improve the lack of parallel corpus per document. As a result of the experiment, by applying the data augmentation technique using the cluster algorithm and NSP to the sentence unit parallel corpus without context, the performance of the document-level machine translation is improved by S-BLEU 3.0 and D-BLEU 2.7 compared to that before application of the data augmentation technique.

Early Anomaly Detection of LNG-Carrier Main Engine System based on Multivariate Time-Series Boundary Forecasting and Confidence Evaluation Technique

Donghyun Kim, Taigon Kim, Minji An, Yunju Baek

http://doi.org/10.5626/JOK.2023.50.5.429

Recently, a variety of studies have been conducted to detect abnormal operation of ships and their causes and in the marine and shipbuilding industries. This study proposed a method for early anomaly detection of the main engine system using a multivariate time series sensor data extracted from LNG carriers built at a shipyard. For early anomaly detection, the process of predicting the future value through the sensor data at present is necessary, and in this process, the prediction residual, which is the difference between the actual future value and the predicted value, is generated. Since the generated residual has a significant effect on the early anomaly detection results, a compensating process is necessary. We propose novel loss functions that can learn the upper or lower prediction boundary of a time-series forecasting model. The time-series forecasting model trained with the proposed loss function improves the performance of the early anomaly detection algorithm by compensating the prediction residual. In addition, the real-time confidence of the predicted value is evaluated through the newly proposed confidence model by utilizing the similarity between time-series forecasting residual and confidence residual. With the early anomaly detection algorithm proposed in this study, the prediction model, which learns the upper boundary, outputs the upper limit of the predicted value that can be output by the baseline prediction model learned with the MSE loss function and can predict abnormal behavior that threshold-based anomaly discriminator could not predict because the future prediction of the baseline model is lower than the actual future value. Based on the results of this study, the performance of the proposed method was improved to 0.9532 compared to 0.4001 of the baseline model in Recall. This means that robust early anomaly detection is possible in various operating styles of the actual ship operations.

Integrating Domain Knowledge with Graph Convolution based on a Semantic Network for Elderly Depression Prediction

Seok-Jun Bu, Kyoung-Won Park, Sung-Bae Cho

http://doi.org/10.5626/JOK.2023.50.3.243

Depression in the elderly is a global problem that causes 300 million patients and 800,000 suicides every year, so it is critical to detect early daily activity patterns closely related to mobility. Although a graph-convolution neural network based on sensing logs has been promising, it is required to represent high-level behaviors extracted from complex sensing information sequences. In this paper, a semantic network that structuralizes the daily activity patterns of the elderly was constructed using additional domain knowledge, and a graph convolution model was proposed for complementary uses of low-level sensing log graphs. Cross-validation with 800 hours of data from 69 senior citizens provided by DNX, Inc. revealed improved prediction performance for the suggested strategy compared to the most recent deep learning model. In particular, the inference of a semantic network was justified by a graph convolution model by showing a performance improvement of 28.86% compared with the conventional model.

A Study of Metric and Framework Improving Fairness-utility Trade-off in Link Prediction

Heeyoon Yang, YongHoon Kang, Gahyung Kim, Jiyoung Lim, SuHyun Yoon, Ho Seung Kim, Jee-Hyong Lee

http://doi.org/10.5626/JOK.2023.50.2.179

The advance in artificial intelligence (AI) technology has shown remarkable improvements over the last decade. However, sometimes, AI makes biased predictions based on real-world big data that intrinsically contain discriminative social factors. This problem often arises in friend recommendations in Social Network Services (SNS). In the case of social network datasets, Graph Neural Network (GNN) is utilized for training these datasets, but it has a high tendency to connect similar nodes (Homophily effect). Furthermore, it is more likely to make a biased prediction based on socially sensitive attributes, such as, gender or religion, making it ethically more problematic. To overcome these problems, various fairness-aware AI models and fairness metrics have been proposed. However, most of the studies used different metrics to evaluate fairness and did not consider the trade-off relationship that existed between accuracy and fairness. Thus, we propose a novel fairness metric called Fairβ-metri which takes both accuracy and prediction into consideration, and a framework called FairU that shows outstanding performance in the proposed metric.

Developing a Testability Prediction Model for High Complexity Software using Regression Analysis

Hyunjae Choi, Heungseok Chae

http://doi.org/10.5626/JOK.2023.50.2.162

Testability is the degree to which the software supports testing in a given test context. Early prediction of testability can help developers identify software components that require a lot of effort to ensure software quality, plan testing activities, and recognize the need for refactoring to reduce testing effort. Existing studies have been conducted to predict testability by performing regression analysis using software metrics and code coverage. These studies used training data with a large proportion of simple software structures. However, prediction models trained with imbalanced data with a large proportion of simple structures may have low testability prediction accuracy of high complexity software. We used the training data generated based on the metric acceptance criteria of industry domain standards to build a prediction model considering high complexity software. As a result of building a testability prediction model using three regression analyses, we construct a predictive model with a branch coverage error of about 4.4% and a coefficient of determination of 0.86.

Dovetail Usage Prediction Model for Resource-Efficient Virtual Machine Placement in Cloud Computing Environment

Hyeongbin Kang, Hyeon-Jin Yu, Jungbin Kim, Heeseok Jeong, Jae-Hyuck Shin, Seo-Young Noh

http://doi.org/10.5626/JOK.2023.50.12.1041

As IT services have migrated to the cloud, efficient resource management in cloud computing environments has become an important issue. Consequently, research has been conducted on virtual machine placement(VMP), which can increase resource efficiency without the need for additional equipment in data centers. This paper proposes the use of a usage prediction model as a method for selecting and deploying hosts suitable for virtual machine placement. The dovetail usage prediction model, which improves the shortcomings of the existing usage prediction models, measures indicators such as CPU, disk, and memory usage of virtual machines running on hosts and extracts features using a deep learning model by converting them into time series data. By utilizing this approach in virtual machine placement, hosts can be used efficiently while ensuring appropriate load balancing of the virtual machines.

Applying Multitopic Analysis of Bug Reports and CNN algorithm to Bug Severity Prediction

Eontae Kim, Geunseok Yang, Inhong Jung

http://doi.org/10.5626/JOK.2023.50.11.954

Bugs are common in software development. Depending on the severity of bugs, they can be classified as major errors and minor errors. In addition, the severity of the bug can be selected by the bug reporter. However, the bug reporter could apply subjective judgment, which can lead to errors in the severity judgment. To resolve this problem, in this study, we predict the bug severity by applying topic-based Severe and Non-Severe extraction with convolutional neural network (CNN) learning. First, by using the properties of the bug report, is the predicting process is divided into Global topic, Product topic, Component topic and Priority topic and the bug reports are extracted from each topic based on Severe and Non-Severe. The Severe and Non-Severe features are extracted from the Global topics, and severity features are extracted from the Product, Component and Priority topics in the same way. The extracted features are combined, put into the CNN algorithm as an input layer, and the model is trained. To evaluate the efficiency of our model, a comparison between the proposed model and the baselines were conducted in the Eclipse, Mozilla, Apache and KDE open-source projects. Our model showed an improved performance. The results showed 97% for Eclipse, 96% for Mozilla, 95% for Apache and 99% for KDE, showing an average performance improvement of about 24.59% compared to the baseline, and a statistically significant difference.

Improvement Study on Active Learning-based Cross-Project Defect Prediction System

Taeyeun Yang, Hakjoo Oh

http://doi.org/10.5626/JOK.2023.50.11.931

This study proposes a practical improvement method for an active learning-based system for cross-project defect prediction. A previous study applied active learning tech- niques to practically improve the performance of cross-project defect prediction, but it used a traditional machine learning model that used hand-made features as input for active learning target selection and defect prediction, therefore feature extraction was expensive and performance was limited. In addition, the problem of performance deviation according to the selection of the input project remained. In this study, the following methods were proposed to overcome these limitations. First, we used a deep learning model that can use the source code as an input to lower the model building cost and improve prediction performance. Second, a Bayesian convolutional neural network is applied to select an active learning target using a deep learning model. Third, instead of considering a single source project, we applied a method that automatically extracts a training data set from multiple projects. Applying the system proposed in this study to 7 open source projects improved the average prediction performance by 13.58% compared to the previous latest research.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr