Journal of KIISE

Search : [ keyword: 데이터 분포 ] (3)

Software defect prediction is a helpful technique for effective testing resource allocation. Software cross-version defect prediction reflects the environment in which the software is developed in a continuous version, with software modules added or deleted through a version update process. Repetition of this process can cause differences in data distribution between versions, which can negatively affect defect prediction performance. Deep domain adaptation(DeepDA) techniques are methods used to reduce distribution difference between sources and target data in the field of computer vision. This paper aims to reduce difference in data distribution between versions using various DeepDA techniques and to identify techniques with the best defect prediction performance. We compared performance between deep domain adaptation techniques (i.e., Domain-Adversarial Neural Network (DANN), Adversarial Discriminator Domain Apaptation (ADDA), and Wasserstein Distance Guided Representation Learning (WDGRL)) and identified performance differences according to the pair of source data. We also checked performance difference according to the ratio of target data used in the learning process and performance difference in terms of hyperparameter setting of the DANN model. Experimental results showed that DANN was more suitable for cross-version defect prediction environments. The DANN model performed the best when using all previous versions of data except the target version as a source. In particular, it showed the best performance when setting the number of hidden layers of the DANN model to 3. In addition, when applying the DeepDA technique, the more target data used in the learning process, the better the performance. This study suggests that various DeepDA techniques can be used to predict software cross-version defects in the future.

Dimensional Sentiment Analysis of Korean Text using Data Balancing

Taehee Jeon, Changhwan Kim

http://doi.org/10.5626/JOK.2021.48.7.790

Compared with most studies on categorical sentiment analysis which aims to represent emotional states as a small set of emotion categories, there have been fewer studies on dimensional sentiment analysis which treats sentiment analysis as a regression problem because of the shortage of data. Recently, the National Information Society Agency (NIA) released open data, Multimodal Video Data, through their web site, AI Hub. Using this data, we experimented with dimensional sentiment analysis of Korean text. For this purpose, we used CNN which is one of the conventional deep learning models in NLP. We also verified that data balancing could improve the performance of models. The results show that the model trained on Multimodal Video Data performs well enough to show that the data should be useful for dimensional sentiment analysis of Korean text and that with data balancing the model can perform better in spite of their fewer training data.

Learning Multiple Instance Support Vector Machine through Positive Data Distribution

Joong-Won Hwang, Seong-Bae Park, Sang-Jo Lee

http://doi.org/

This paper proposes a modified MI-SVM algorithm by considering data distribution. The previous MI-SVM algorithm seeks the margin by considering the “most positive” instance in a positive bag. Positive instances included in positive bags are located in a similar area in a feature space. In order to reflect this characteristic of positive instances, the proposed method selects the “most positive” instance by calculating the distance between each instance in the bag and a pivot point that is the intersection point of all positive instances. This paper suggests two ways to select the “most positive” pivot point in the training data. First, the algorithm seeks the “most positive” pivot point along the current predicted parameter, and then selects the nearest instance in the bag as a representative from the pivot point. Second, the algorithm finds the “most positive” pivot point by using a Diverse Density framework. Our experiments on 12 benchmark multi-instance data sets show that the proposed method results in higher performance than the previous MI-SVM algorithm.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Analysis of Adversarial Learning-Based Deep Domain Adaptation for Cross-Version Defect Prediction

Dimensional Sentiment Analysis of Korean Text using Data Balancing

Learning Multiple Instance Support Vector Machine through Positive Data Distribution

Search

Editorial Office