Digital Library[ Search Result ]
Deep Ensemble Network with Explicit Complementary Model for Accuracy-balanced Classification
http://doi.org/10.5626/JOK.2019.46.9.941
One of the major evaluation metrics for classification systems is average accuracy, while accuracy deviation is another important performance metric used to evaluate various deep neural networks. In this paper, we present a new ensemble-like fast deep neural network, Harmony, that can reduce the accuracy deviation among categories without degrading the overall average accuracy. Harmony consists of three sub-models: the Target model, Complementary model, and Conductor model. In Harmony, an object is classified by using either the Target model or the Complementary model. The Target model is a conventional classification network for general categories, while the Complementary model is a classification network specifically for weak categories that are inaccurately classified by the Target model. The Conductor model is used to select one of the two models. The experimental results indicate that Harmony accurately classifies categories and also, reduces the accuracy deviation among the categories.
Jamo Unit Convolutional Neural Network Based Automatic Classification of Frequently Asked Questions with Spelling Errors
Youngjin Jang, Harksoo Kim, Dongho Kang, Sebin Kim, Hyunki Jang
http://doi.org/10.5626/JOK.2019.46.6.563
Web and mobile users obtain the desired information using the frequently asked questions (FAQ) listed on the homepage. The FAQ system displays a query response candidate that is most similar to the input based on an information retrieval model. However, the information retrieval model depends on the index, and therefore, it is vulnerable to spelling errors in the sentence. This paper proposes a model applying the FAQ system to the sentence classifier, which minimizes the spelling errors. Using the embedded layer with jamo-based convolutional neural network, the spelling errors of the user input were reduced. The performance of the classifier was improved using class embedding and feed-forward neural network. As a result of 457 and 769 FAQ classifications, the Micro F1 score showed 81.32% p and 61.11% p performance, respectively. We used the sigmoid function to quantify the reliability of the model prediction.
Elastic Multiple Parametric Exponential Linear Units for Convolutional Neural Networks
http://doi.org/10.5626/JOK.2019.46.5.469
Activation function plays a major role in determining the depth and non-linearity of neural networks. Since the introduction of Rectified Linear Units for deep neural networks, many variants have been proposed. For example, Exponential Linear Units (ELU) leads to faster learning as pushing the mean of the activations closer to zero, and Elastic Rectified Linear Units (EReLU) changes the slope randomly for better model generalization. In this paper, we propose Elastic Multiple Parametric Exponential Linear Units (EMPELU) as a generalized form of ELU and EReLU. EMPELU changes the slope for the positive part of the function argument randomly within a moderate range during training, and the negative part can be dealt with various types of activation functions by its parameter learning. EMPELU improved the accuracy and generalization performance of convolutional neural networks in the object classification task (CIFAR-10/100), more than well-known activation functions.
Categories and Patterns of Java Program Unit Test Code Bugs
http://doi.org/10.5626/JOK.2019.46.4.341
Since unit testing is widely used in many software projects, the threat of unit test bugs(i.e., bugs in the test case code) is becoming a more important issue of software quality assurance. Test code bugs are critical threats because they may invalidate the quality assurance process, which consequently hurts quality of products and performance of the project. This paper presents a set of test bug categories and a set of bug patterns extracted from real-world cases. Unlike the existing work on test code bugs, this paper suggests a classification method to systematically categorize different features of test code bugs (i.e., structures, operations, and requirements). In addition, this paper defines eight new bug patterns in unit test code, based on previous bug reports from well-known open-source projects. Each pattern is formally specified as source code patterns so that it can be used for to construct a static bug pattern checker.
Systematic Analysis of Optimal Feature Extraction Methods for Developing a Near-Infrared Spectroscopy-Based Brain-Computer Interface System
Jaeyoung Shin, Han-Jeong Hwang
http://doi.org/10.5626/JOK.2018.45.10.1080
In this study, we systematically investigated optimal feature extraction methods for developing a near-infrared spectroscopy (NIRS)-based brain-computer interface (BCI) by considering various analysis time periods and feature combinations. While twelve subjects performed mental arithmetic and resting tasks for 10 s 30 times each, NIRS signals were measured. Seven types of different features were extracted from the NIRS signals, and classification accuracies were calculated using individual feature types extracted from 0-10 and 0-15 s single analysis periods and feature combinations extracted from 0-15 s analysis period that was divided into three time periods (0-5, 5-10, 10-15 s), respectively. As a result, the highest classification accuracy was obtained when the combination of different feature types extracted from a 0-15 s analysis period divided into the three periods was used, and it was confirmed that the combinations of mean and slope features were considered the most suitable for developing a NIRS-based BCI system.
A Twitter News-Classification Scheme Using Semantic Enrichment of Word Features
Seonmi Ji, Jihoon Moon, Hyeonwoo Kim, Eenjun Hwang
http://doi.org/10.5626/JOK.2018.45.10.1045
Recently, with the popularity of Twitter as a news platform, many news articles are generated, and various kinds of information and opinions about them spread out very fast. But since an enormous amount of Twitter news is posted simultaneously, users have difficulty in selectively browsing for news related to their interests. So far, many works have been conducted on how to classify Twitter news using machine learning and deep learning. In general, conventional machine learning schemes show data sparsity and semantic gap problems, and deep learning schemes require a large amount of data. To solve these problems, in this paper, we propose a Twitter news-classification scheme using semantic enrichment of word features. Specifically, we first extract the features of Twitter news data using the Vector Space Model. Second, we enhance those features using DBpedia Spotlight. Finally, we construct a topic-classification model based on various machine learning techniques and demonstrate by experiments that our proposed model is more effective than other traditional methods.
A Linguistic Study of Speech Act and Automatic Speech Act Classification for Korean Tutorial Dialog
Youngeun Koo, Jiyoun Kim, Munpyo Hong, Youngkil Kim
http://doi.org/10.5626/JOK.2018.45.8.807
Speech act is a speaker’s intention of utterance in communication. To communicate successfully, we need to figure out speech act of a speaker’s utterance correctly. This paper proposed linguistic features of an utterance that affect speech act classification by analyzing Korean tutorial dialogue. Ultimately we hope this enables automatic speech act classification. Thirteen linguistically motivated features are suggested in this paper and verified with WEKA 3.8.1. The accuracy of the proposed linguistically motivated features of speech act classification reached 70.03%. Approximately 30%p of accuracy has improved compared to a baseline, using unigram and bigram as the only features of speech act classification.
Combinations of Text Preprocessing and Word Embedding Suitable for Neural Network Models for Document Classification
http://doi.org/10.5626/JOK.2018.45.7.690
Neural networks with word embedding have recently used for document classification. Researchers concentrate on designing new architecture or optimizing model parameters to increase performance. However, most recent studies have overlooked text preprocessing and word embedding, in that the description of text preprocessing used is insufficient, and a certain pretrained word embedding model is mostly used without any plausible reasons. Our paper shows that finding a suitable combination of text preprocessing and word embedding can be one of the important factors required to enhance the performance. We conducted experiments on AG’s News dataset to compare those possible combinations, and zero/random padding, and presence or absence of fine-tuning. We used pretrained word embedding models such as skip-gram, GloVe, and fastText. For diversity, we also use an average of multiple pretrained embeddings (Average), randomly initialized embedding (Random), task data-trained skip-gram (AGNews-Skip). In addition, we used three advanced neural networks for the sake of generality. Experimental results based on OOV (Out Of Vocabulary) word statistics suggest the necessity of those comparisons and a suitable combination of text preprocessing and word embedding.
Identification of Heterogeneous Prognostic Genes and Prediction of Cancer Outcome using PageRank
http://doi.org/10.5626/JOK.2018.45.1.61
The identification of genes that contribute to the prediction of prognosis in patients with cancer is one of the challenges in providing appropriate therapies. To find the prognostic genes, several classification models using gene expression data have been proposed. However, the prediction accuracy of cancer prognosis is limited due to the heterogeneity of cancer. In this paper, we integrate microarray data with biological network data using a modified PageRank algorithm to identify prognostic genes. We also predict the prognosis of patients with 6 cancer types (including breast carcinoma) using the K-Nearest Neighbor algorithm. Before we apply the modified PageRank, we separate samples by K-Means clustering to address the heterogeneity of cancer. The proposed algorithm showed better performance than traditional algorithms for prognosis. We were also able to identify cluster-specific biological processes using GO enrichment analysis.
A Transfer Learning Method for Solving Imbalance Data of Abusive Sentence Classification
http://doi.org/10.5626/JOK.2017.44.12.1275
The supervised learning approach is suitable for classification of insulting sentences, but pre-decided training sentences are necessary. Since a Character-level Convolution Neural Network is robust for each character, so is appropriate for classifying abusive sentences, however, has a drawback that demanding a lot of training sentences. In this paper, we propose transfer learning method that reusing the trained filters in the real classification process after the filters get the characteristics of offensive words by generated abusive/normal pair of sentences. We got higher performances of the classifier by decreasing the effects of data shortage and class imbalance. We executed experiments and evaluations for three datasets and got higher F1-score of character-level CNN classifier when applying transfer learning in all datasets.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr