Digital Library[ Search Result ]
Automatic Construction of a Negative/positive Corpus and Emotional Classification using the Internet Emotional Sign
Kyoungae Jang, Sanghyun Park, Woo-Je Kim
Internet users purchase goods on the Internet and express their positive or negative emotions of the goods in product reviews. Analysis of the product reviews become critical data to both potential consumers and to the decision making of enterprises. Therefore, the importance of opinion mining techniques which derive opinions by analyzing meaningful data from large numbers of Internet reviews. Existing studies were mostly based on comments written in English, yet analysis in Korean has not actively been done. Unlike English, Korean has characteristics of complex adjectives and suffixes. Existing studies did not consider the characteristics of the Internet language. This study proposes an emotional classification method which increases the accuracy of emotional classification by analyzing the characteristics of the Internet language connoting feelings. We can classify positive and negative comments about products automatically using the Internet emoticon. Also we can check the validity of the proposed algorithm through the result of high precision, recall and coverage for the evaluation of this method.
Cell Type Prediction for Single-cell RNA Sequencing based on Unsupervised Domain Adaptation and Semi-supervised Learning
http://doi.org/10.5626/JOK.2025.52.2.125
Single-cell RNA sequencing (scRNA-seq) techniques for measuring gene expression in individual cells have developed rapidly. Recently, deep learning has been employed to identify cell types in scRNA-seq analysis. Most methods utilize a dataset containing cell-type labels to train the model and then apply this model to other datasets. However, integrating multiple datasets can result in unexpected batch effects caused by variations in laboratories, experimenters, and sequencing techniques. Since batch effect can obscure the biological signals of interest, an effective batch correction method is essential. In this paper, we present a cell-type prediction model for scRNA-seq that utilizes unsupervised domain adaptation and semi-supervised learning to minimize distributional differences between datasets. First, we pre-train the proposed model using a source dataset that contains cell-type information. Subsequently, we train the model on the target dataset by leveraging adversarial training to align its distribution of the target dataset with that of the source dataset. Finally, we re-train the model to enhance performance through semi-supervised learning, utilizing both the source and target datasets with consistency regularization. The proposed model outperformed the other deep learning-based batch correction models by effectively removing batch effects.
Approximating the Accuracy of Classification Models Using Self-differential Testing
Jubin Lee, Taeho Kim, Yu-Seung Ma
http://doi.org/10.5626/JOK.2022.49.12.1143
Differential testing is a traditional software testing technique that detects errors by observing whether similar applications generate different outputs for the same input. Differential testing is also used in artificial intelligence systems. Existing research involves the cost of finding a high-quality reference neural network with the same function as the target neural network but different architectures. We propose a self-differential testing technique that evaluates a classification model by making a reference model using a target neural network without the need to find the neural network of another architecture when differential testing. Experiments confirmed that self-differential testing produced similar effects at a lower cost than the existing research that requires other reference models. In addition, we propose an accuracy approximation method for classification models using self-differential analysis, which is an application of self-differential testing. The approximate accuracy through self-differential testing was confirmed to show a small difference of 0.0002 to 0.09 from the actual accuracy in experiments using similar datasets of MNIST and CIFAR10.
Automatic Classification of Pneumonia Based on Ensemble Deep Learning Model Using Intensity Normalization and Multiscale Lung-Focused Patches on Chest X-Ray Images
Yoon Jo Kim, Jinseo An, Helen Hong
http://doi.org/10.5626/JOK.2022.49.9.677
It is difficult to classify normal and pneumonia in pediatric chest X-ray (CXR) images due to irregular intensity values. In addition, deep learning model has a limitation in that it can misclassify CXR by incorrectly focusing on the outer part of the lung. This study proposed an automatic classification of pneumonia based on ensemble deep learning model using three intensity normalizations and multiscale lung-focused patches on CXR images. First, to correct for irregular intensity values in internal lungs, three intensity normalization methods were performed respectively. Second, to focus on internal lungs, regions of interest were extracted by segmenting lung regions. Third, multiscale lung-focused patches were extracted to train the characterization of pneumonia. Finally, ensemble modeling with attention module was performed to improve the classification performance. In the experiment, the method using large patches of CLAHE images showed an accuracy of 92%, which was 5% higher than that of original images. Furthermore, the proposed method using an ensemble of large and middle patches showed the best performance with an accuracy of 93%.
Comparison of BERT-based Model Performance in CBCA Criteria Classification
Junho Shin, Jungsoo Shin, Eunkyung Jo, Yeohoon Yoon, Jaehee Jung
http://doi.org/10.5626/JOK.2022.49.9.727
In the case of child sex crimes, the victim"s statement plays a critical role in determining the existence or innocence of the case, so the Supreme Prosecutors" Office classifies the statement into a total of 19 criteria according to Criteria-Based Content Analysis (CBCA), a victim"s statement analysis technique. However, this may differ in criteria classification according to the subjective opinion of the statement analyst. Thus, in this paper, two major classification methods were applied and analyzed to present an criteria classification model using BERT and RoBERTa. The two methods comprise of a method of classifying the entire criterion at the same time, as well as method of dividing it into four groups, and then classifying the criteria within the group secondarily. The experiment classified statements into 16 criteria of CBCA and performed comparative analysis using several pre-trained models. As a result of the classification, the former classification method performed better than the latter classification method in 13 of the total 16 criteria, and the latter method was effective in three criteria with a relatively insufficient number of training data. Additionally, the RoBERTa-based model performed better than the BERT-based model in 15 of the 16 criteria, and the BERT model, which was pre-trained using only Korean conversational colloquial language, classified the remaining one criterion uniquely. This paper shows that the proposed model, which was pre-trained using interactive colloquial data is effective in classifying children"s statement sentences.
The Multivariate Sensor Data Classification using Time Series Imaging
http://doi.org/10.5626/JOK.2022.49.8.593
Various methods have been proposed in order to predict the future, from statistical-based time series analysis methods to deep learning-based prediction models, such as LSTM. However, the real industry data are highly complex due to various unpredictable factors. Therefore, it is difficult for the prediction models alone to extract valuable information from the data. Time series imaging is a method for converting time series into two-dimensional images, enabling the extraction of information that is difficult to interpret from raw data. In this paper, we transform the multivariate sensor data into two-dimensional multichannel images, and based on them, we propose a time series classification method. Furthermore, we compare the proposed method with the previous time series prediction methods to verify its usefulness.
KcBert-based Movie Review Corpus Emotion Analysis Using Emotion Vocabulary Dictionary
Yeonji Jang, Jiseon Choi, Hansaem Kim
http://doi.org/10.5626/JOK.2022.49.8.608
Emotion analysis is the classification of human emotions expressed in text data into various emotional types such as joy, sadness, anger, surprise, and fear. In this study, using the emotion vocabulary dictionary, the emotions expressed in the movie review corpus were classified into nine categories: joy, sadness, fear, anger, disgust, surprise, interest, boredom, and pain to construct an emotion corpus. Then, the performance of the model was evaluated by training the emotion corpus in KcBert. To build the emotion analysis corpus, an emotion vocabulary dictionary based on a psychological model was used. It was judged whether the vocabulary of the emotion vocabulary dictionary and the emotion vocabulary displayed in the movie review corpus matched, and the emotion type matching the vocabulary appearing at the end of the movie review corpus was tagged. Based on the performance of the emotion analysis corpus constructed in this way by training it on KcBert pre-trained with NSMC, KcBert showed excellent performance in the model classified into 9 types.
Improvement of the Classification Model Performance in 119-Emergency Report Data
Eunjung Kwon, Hyuinho Park, Sungwon Byon, Kyuchul Lee
http://doi.org/10.5626/JOK.2022.49.1.89
This paper presents a study of the text classification model to provide optimal response information for each disaster situation with respect to the report content recorded by the receiver in the process of receiving the 119 emergency report. A text classification model that receives a sentence and classifies it into a category is a widely used technique in the field of natural language processing. This study defined the rules for using augmented learning data to improve the performance of the text classification model through supervised learning, and confirmed the performance of the classification model using the augmented learning data through experiments. Through this study, the possibility of extension for improving the performance of the text classification model that is input as the report contents for each emergency situation, such as disease, traffic accident, and injury, was suggested.
Design and Evaluation of Loss Functions based on Classification Models
Hyun-Kyu Jeon, Yun-Gyung Cheong
http://doi.org/10.5626/JOK.2021.48.10.1132
Paraphrase generation is a task in which the model generates an output sentence conveying the same meaning as the given input text but with a different representation. Recently, paraphrase generation has been widely used for solving the task of using artificial neural networks with supervised learning between the model’s prediction and labels. However, this method gives limited information because it only detects the representational difference. For that reason, we propose a method to extract semantic information with classification models and use them for the training loss function. Our evaluations showed that the proposed method outperformed baseline models.
Improving Low Resource Chest X-ray Classification Accuracy by Combining Data Augmentation and Weakly Supervised Learning
http://doi.org/10.5626/JOK.2021.48.9.1027
Deep learning-based medical image analysis technology has been developed to the extent that it shows an accuracy surpassing the ability of a human radiologist. However, labeling sample data for use in learning medical images requires human experts and a great deal of time and expense. In addition, the training data for medical images has an unbalanced data distribution in many cases. For example, in the case of the ChestX-ray14 dataset, the difference between the number of data for infiltration and hernia is about 87 times. In this study, we proposed a method that combines the data augmentation algorithm Mixup and weakly supervised learning to improve the performance of data-imbalanced chest X-ray classification. The proposed method is to apply Mixup algorithm to a small number of labeled data and a large number of unlabeled data to alleviate data imbalance and performs curriculum learning that effectively utilizes the unlabeled data while cycling through the teacher model and the student model. Experimental results in an environment with a small number of labeled data and a large number of unlabeled data that can be considered in the medical field showed that the classification performance was improved by combining data augmentation and weakly supervised learning and that the cyclic curriculum learning was effective.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr