Digital Library[ Search Result ]
Cell Type Prediction for Single-cell RNA Sequencing based on Unsupervised Domain Adaptation and Semi-supervised Learning
http://doi.org/10.5626/JOK.2025.52.2.125
Single-cell RNA sequencing (scRNA-seq) techniques for measuring gene expression in individual cells have developed rapidly. Recently, deep learning has been employed to identify cell types in scRNA-seq analysis. Most methods utilize a dataset containing cell-type labels to train the model and then apply this model to other datasets. However, integrating multiple datasets can result in unexpected batch effects caused by variations in laboratories, experimenters, and sequencing techniques. Since batch effect can obscure the biological signals of interest, an effective batch correction method is essential. In this paper, we present a cell-type prediction model for scRNA-seq that utilizes unsupervised domain adaptation and semi-supervised learning to minimize distributional differences between datasets. First, we pre-train the proposed model using a source dataset that contains cell-type information. Subsequently, we train the model on the target dataset by leveraging adversarial training to align its distribution of the target dataset with that of the source dataset. Finally, we re-train the model to enhance performance through semi-supervised learning, utilizing both the source and target datasets with consistency regularization. The proposed model outperformed the other deep learning-based batch correction models by effectively removing batch effects.
Pseudo-label Correction using Large Vision-Language Models for Enhanced Domain-adaptive Semantic Segmentation
http://doi.org/10.5626/JOK.2024.51.5.464
It is very expensive to make semantic segmentation labels for real-world images. To solve this problem in unsupervised domain adaptation, the model is trained by using data generated in a virtual environment that can easily collect labels or data is already collected and real-world images without labels. One of the common problems in unsupervised domain adaptation is that thing classes with similar appearance are easily confused. In this paper, we propose a method of calibrating the label of the number of target data using large vision-language models. Making the number of labels generated for the target image more accurate can reduce confusion among thing classes. The proposed method improves the performance of DAFormer by +1.1 mIoU in adaptation from game to reality and +1.1 mIoU in adaptation from day to night. For thing classes, the proposed method improved the performance of the MIC by +0.6 mIoU in adaptation from game to reality and +0.7 mIoU in adaptation from day to night.
Robust Korean Table Machine Reading Comprehension across Various Domains
Sanghyun Cho, Hye-Lynn Kim, Hyuk-chul Kwon
http://doi.org/10.5626/JOK.2023.50.12.1102
Unlike regular text data, tabular data has structural features that allow it to represent compressed information. This has led to their use in a variety of domains, and machine reading comprehension of tables has become an increasingly important aspect of Machine Reading Comprehension(MRC). However, the structure of tables and the knowledge required for each domain are different, and when a language model is trained for a single domain, the evaluation performance of the model in other domains is likely to be reduced, resulting in poor generalization performance. To overcome this, it is important to build datasets of various domains and apply various techniques rather than simply pre-trained models. In this study, we design a language model that learns cross-domain invariant linguistic features to improve domain generalization performance. We applied adversarial training to improve performance on evaluation datasets in each domain and modify the structure of the model by adding an embedding layer and a transformer layer specialized for tabular data. When applying adversarial learning, we found that the model with a structure that does not add table-specific embeddings improves performance. On the other hand, while adding a table-specific transformer layer and having the added layer receive additional table-specific embeddings as input, shows the best performance on data from all domains.
Gender Classification Model Based on Colloquial Text in Korean for Author Profiling of Messenger Data
Jihye Kang, Minho Kim, Hyuk-Chul Kwon
http://doi.org/10.5626/JOK.2023.50.12.1063
With explosive social network services (SNS) growth, there has been an extensive generation of text data through messenger services. In addition, various applications such as Sentiment Analysis, Abusive text Detection, and Chatbot have been developed and provided due to the recent development of Natural Language Processing. However, there has not been an attempt to classify various characteristics of authors such as the gender and age of speakers in Korean colloquial texts. In this study, I propose a gender classification model for author profiling using Korean colloquial texts. Based on Kakao Talk data for the gender classification of the speaker, the Domain Adaptation is carried out by additionally learning ‘Nate Pan’ data to KcBERT(Korean Comments BERT) which is learned by Korean comments. Results of experimenting with a model that combines External Lexical Information showed that the performance was improved by achieving an accuracy of approximately 95%. In this study, the self-collected ‘Nate Pan’ data and the "daily conversation" data provided by the National Institute of the Korean Language were used for domain adaptation, and the ‘Korean SNS’ data of AI HUB was used for model learning and evaluation.
Analysis of Adversarial Learning-Based Deep Domain Adaptation for Cross-Version Defect Prediction
Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim
http://doi.org/10.5626/JOK.2023.50.6.460
Software defect prediction is a helpful technique for effective testing resource allocation. Software cross-version defect prediction reflects the environment in which the software is developed in a continuous version, with software modules added or deleted through a version update process. Repetition of this process can cause differences in data distribution between versions, which can negatively affect defect prediction performance. Deep domain adaptation(DeepDA) techniques are methods used to reduce distribution difference between sources and target data in the field of computer vision. This paper aims to reduce difference in data distribution between versions using various DeepDA techniques and to identify techniques with the best defect prediction performance. We compared performance between deep domain adaptation techniques (i.e., Domain-Adversarial Neural Network (DANN), Adversarial Discriminator Domain Apaptation (ADDA), and Wasserstein Distance Guided Representation Learning (WDGRL)) and identified performance differences according to the pair of source data. We also checked performance difference according to the ratio of target data used in the learning process and performance difference in terms of hyperparameter setting of the DANN model. Experimental results showed that DANN was more suitable for cross-version defect prediction environments. The DANN model performed the best when using all previous versions of data except the target version as a source. In particular, it showed the best performance when setting the number of hidden layers of the DANN model to 3. In addition, when applying the DeepDA technique, the more target data used in the learning process, the better the performance. This study suggests that various DeepDA techniques can be used to predict software cross-version defects in the future.
A Cross Domain Adaptation Method based on Adversarial Cycle Consistence Learning for Rotary Machine Fault Diagnosis
http://doi.org/10.5626/JOK.2022.49.7.530
Research on data-based fault diagnosis models is being actively conducted in various industries. However, in the case of industrial equipment, various operating conditions occur, and it is difficult to secure sufficient training data. To solve this problem, a cross-domain adaptation technique can be utilized. In this study, we propose an adversarial consistency-maintaining transformation learning method that can maintain failure classification consistency even for the new untrained environmental data using the rotating body vibration data. The data generated through consistent learning creates a continuous invariant latent space between the new operating condition data distribution and the known data distribution and learns to maintain the failure classification performance through an adversarial learning network that shares the failure classification characteristic information. Therefore, the proposed method can provide a more stable and general classification performance by expanding the potential space to minimize the discrepancy between domain data. The experimental results of the proposed model showed about 88% accuracy for a real-machine dataset, and compared to the existing cross-domain adaptive learning methods, it showed a performance improvement of about 5-10%. According to the results of this study, it is expected to be an effective solution for the problem of equipment failure diagnosis at actual industrial sites.
Korean Semantic Role Labeling Using Domain Adaptation Technique
Soojong Lim, Yongjin Bae, Hyunki Kim, Dongyul Ra
Developing a high-performance Semantic Role Labeling (SRL) system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Performances of Korean SRL are degraded by almost 15% or more, when it is directly applied to another domain with relatively small training data. This paper proposes two techniques to minimize performance degradation in the domain transfer. First, a domain adaptation algorithm for Korean SRL is proposed which is based on the prior model that is one of domain adaptation paradigms. Secondly, we proposed to use simplified features related to morphological and syntactic tags, when using small-sized target domain data to suppress the problem of data sparseness. Other domain adaptation techniques were experimentally compared to our techniques in this paper, where news and Wikipedia were used as the sources and target domains, respectively. It was observed that the highest performance is achieved when our two techniques were applied together. In our system"s performance, F1 score of 64.3% was considered to be 2.4~3.1% higher than the methods from other research.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr