Digital Library[ Search Result ]
A Pre-processing Method for Learning Data Using eXplainable Artificial Intelligence
Changhong Lee, Jaemin Lee, Donghyun Kim, Jongdeok Kim
http://doi.org/10.5626/JOK.2023.50.2.133
Artificial intelligence model generation proceeds to the stages of learning data processing, model learning, and model evaluation. Data pre-processing techniques for creating quality learning data contribute many of the methods for improving model accuracy. Existing pre-processing techniques tend to rely heavily on the experience of model generators. If pre-processing is performed based on experience, it is difficult to explain the basis for selecting the corresponding pre-processing technique. However, the reason why generators are forced to rely on experience is that the learning model becomes huge and complicated to a level that is difficult for humans to interpret. Therefore, research is being conducted to explain the operation method of the model by introducing eXplainable AI. In this paper, we propose a learning data pre-processing system using eXplainable AI. The system operation process is trained with data that has not been pre-processed, the learned model is analyzed using eXplainable AI, and the data pre-processing is repeated based on that information. Finally, we will improve the model performance, explain pre-processing reliability, and show the practicality of the system.
CoEM: Contrastive Embedding Mapper for Audio-visual Latents
Gihun Lee, Kyungchae Lee, Minchan Jeong, Myungjin Lee, Se-young Yun, Chan-hyun Yun
http://doi.org/10.5626/JOK.2023.50.1.80
Human perception can link audio-visual information to each other, making it possible to recall visual information from audio information and vice versa. Such ability is naturally acquired by experiencing situations where these two kinds of information are combined. However, it is hard to obtain video datasets that are richly combined with both types of information, and at the same time, labeled for the semantics of each scene. This paper proposes a Contrastive Embedding Mapper (CoEM), which maps embedding from one type of information to the another, corresponding to its categorical modality. Paired data is not required, CoEM learns to contrast the mapped embedding by its categories. We validated the efficacy of CoEM on the embeddings for audio and visual datasets which were trained to classify 20 shared categories. In the experiment, the embedding mapped by CoEM showed that it was capable of retrieving and generating data on its mapped domain.
Robust Korean Table Machine Reading Comprehension across Various Domains
Sanghyun Cho, Hye-Lynn Kim, Hyuk-chul Kwon
http://doi.org/10.5626/JOK.2023.50.12.1102
Unlike regular text data, tabular data has structural features that allow it to represent compressed information. This has led to their use in a variety of domains, and machine reading comprehension of tables has become an increasingly important aspect of Machine Reading Comprehension(MRC). However, the structure of tables and the knowledge required for each domain are different, and when a language model is trained for a single domain, the evaluation performance of the model in other domains is likely to be reduced, resulting in poor generalization performance. To overcome this, it is important to build datasets of various domains and apply various techniques rather than simply pre-trained models. In this study, we design a language model that learns cross-domain invariant linguistic features to improve domain generalization performance. We applied adversarial training to improve performance on evaluation datasets in each domain and modify the structure of the model by adding an embedding layer and a transformer layer specialized for tabular data. When applying adversarial learning, we found that the model with a structure that does not add table-specific embeddings improves performance. On the other hand, while adding a table-specific transformer layer and having the added layer receive additional table-specific embeddings as input, shows the best performance on data from all domains.
Gender Classification Model Based on Colloquial Text in Korean for Author Profiling of Messenger Data
Jihye Kang, Minho Kim, Hyuk-Chul Kwon
http://doi.org/10.5626/JOK.2023.50.12.1063
With explosive social network services (SNS) growth, there has been an extensive generation of text data through messenger services. In addition, various applications such as Sentiment Analysis, Abusive text Detection, and Chatbot have been developed and provided due to the recent development of Natural Language Processing. However, there has not been an attempt to classify various characteristics of authors such as the gender and age of speakers in Korean colloquial texts. In this study, I propose a gender classification model for author profiling using Korean colloquial texts. Based on Kakao Talk data for the gender classification of the speaker, the Domain Adaptation is carried out by additionally learning ‘Nate Pan’ data to KcBERT(Korean Comments BERT) which is learned by Korean comments. Results of experimenting with a model that combines External Lexical Information showed that the performance was improved by achieving an accuracy of approximately 95%. In this study, the self-collected ‘Nate Pan’ data and the "daily conversation" data provided by the National Institute of the Korean Language were used for domain adaptation, and the ‘Korean SNS’ data of AI HUB was used for model learning and evaluation.
Improvement Study on Active Learning-based Cross-Project Defect Prediction System
http://doi.org/10.5626/JOK.2023.50.11.931
This study proposes a practical improvement method for an active learning-based system for cross-project defect prediction. A previous study applied active learning tech- niques to practically improve the performance of cross-project defect prediction, but it used a traditional machine learning model that used hand-made features as input for active learning target selection and defect prediction, therefore feature extraction was expensive and performance was limited. In addition, the problem of performance deviation according to the selection of the input project remained. In this study, the following methods were proposed to overcome these limitations. First, we used a deep learning model that can use the source code as an input to lower the model building cost and improve prediction performance. Second, a Bayesian convolutional neural network is applied to select an active learning target using a deep learning model. Third, instead of considering a single source project, we applied a method that automatically extracts a training data set from multiple projects. Applying the system proposed in this study to 7 open source projects improved the average prediction performance by 13.58% compared to the previous latest research.
Quantitative Analysis of Sequence-based Container Security Enhancement using a System Call Sequence Extraction Framework
Somin Song, Youyang Kim, Byungchul Tak
http://doi.org/10.5626/JOK.2023.50.11.913
Container escape is one of the most critical threats in containerized applications that share a host kernel. Attackers exploit kernel vulnerabilities through a series of manipulated system calls to achieve privilege escalation, which can lead to container escape. Seccomp is a security mechanism widely used in containers. It strengthens the level of isolation by filtering out unnecessary system call invocations. However, the filtering mechanism of Seccomp that blocks individual system calls has a fundamental limitation in that it can be vulnerable to attacks that use system calls allowed by the policy. Therefore, this study presents a hybrid analysis framework that combines static and dynamic analyses to extract system call sequences from exploit codes. Using this framework, we compared the security strength of an existing individual system call-based filtering mechanism and proposed a system call sequence-based filtering mechanism in terms of the number of blockable exploit codes using system call profiles for the same exploit codes. As a result, the proposed system call sequence-based filtering mechanism was able to increase the defense coverage from 63% to 98% compared to the existing individual system call-based filtering mechanism.
A Study on Compliance of Data and Control Coupling of Weapon System Software Airworthiness Certification
http://doi.org/10.5626/JOK.2023.50.11.995
In 2009, To secure flight safety of military aircraft and enhance the competitiveness of our aircraft exports by applying internationally recognized airworthiness certification standards, South Korea established the "Act on Certification of Flight Safety for Military Aircraft" along with its enforcement decree and regulations. According to these regulations, domestically developed military aircraft are required to be certified following the airworthiness certification laws, procedures, and standards. The standard airworthiness certification criteria for military aircraft, which serve as the basis for airworthiness assessment, was developed by the Defense Acquisition Program Administration (DAPA) and has been revised up to the 7th edition. Among the recent changes, the most impactful area is Chapter 15, which pertains to computer resources in software. As the proportion of software development within weapon systems continues to increase, the related standards are becoming more detailed and refined to keep up with the evolving. This study aimed to clarify and propose verification methods for newly incorporated software coupling criteria in the revised airworthiness certification standards.
Post-training Methods for Improving Korean Document Summarization Model
So-Eon Kim, Seong-Eun Hong, Gyu-Min Park, Choong Seon Hong, Seong-Bae Park
http://doi.org/10.5626/JOK.2023.50.10.882
The document summarization task generates a short summary based on a long document. Recently, a method using a pre-trained model based on a transformer model showed high performance. However, as it was proved that fine-tuning does not train the model optimally due to the learning gap between pre-training and fine-tuning, post-training, which is additional training between pre-training and fine-tuning, was proposed. This paper proposed two post-training methods for Korean document summarization. One was Korean Spacing, which is for learning Korean structure, and the other was First Sentence Masking, which is for learning about document summarization. Experiments proved that the proposed post-training methods were effective as performance improved when the proposed post-training was used compared to when it was not.
Improved Open-Domain Conversation Generative Model via Denoising Training of Guide Responses
Bitna Keum, Hongjin Kim, Jinxia Huang, Ohwoog Kwon, Harksoo Kim
http://doi.org/10.5626/JOK.2023.50.10.851
In recent open-domain conversation research, research is actively conducted to combine the strengths of retrieval models and generative models while overcoming their respective weaknesses. However, there is a problem where the generative model either disregards the retrieved response or copies the retrieved response as it is to generate a response. In this paper, we propose a method of mitigating the aforementioned problems. To alleviate the former problem, we filter the retrieved responses and use the gold response together. To address the latter problem, we perform noising on the gold response and the retrieved responses. The generative model enhances the ability to generate responses via denoising training. The effectiveness of our proposed method is verified through human and automatic evaluation.
Domain Generalized Fashion Object Detection using Style Augmentation and Attention
http://doi.org/10.5626/JOK.2023.50.10.845
With the combination of fashion and computer vision, fashion object detection using deep learning has gained much interest. However, due to the nature of supervision, the performance of the model drops when images with different characteristics are used. We define the dataset with different characteristics and the characteristic of the domain as ‘domain’ and ‘style’, respectively, and propose a new augmentation method that mixes up the existing domain’s style to make a new style. We also use an attention method to extract important features from the images. Using a stylized fashion detection dataset, style deepfashion2, we show that the proposed method enhances performance within all domains.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr