Vol. 50, No. 6,
Jun. 2023
Digital Library
Signature Generation to Detect HWP Malware Based on Threat Factors and Attack Patterns
Minji Choe, Dongjae Jung, Homook Cho, YooJae Won
http://doi.org/10.5626/JOK.2023.50.6.451
A recent increase in telecommuting due to the coronavirus disease 2019 (COVID-19) pandemic has caused ever-increasing incidents of document-type malicious code attacks that insert malicious codes into electronic documents mainly used at work. A Malicious document that spreads through various routes such as messengers, e-mails, and websites can easily bypass existing behavior-based security solutions and internal e-mail monitoring systems because it encodes or obfuscates to conceal the malicious code within documents. In this paper, we identify and classify five core threat factors by analyzing the structure of HWP documents. Furthermore, we generate signatures capable of detecting malicious HWP documents by conducting attack code pattern analysis of the threat factors. Furthermore, we propose a signature generation method to detect the latest malicious HWP documents effectively. In the future, we plan to further expand our research by applying statistical learning techniques to generate signatures automatically.
Analysis of Adversarial Learning-Based Deep Domain Adaptation for Cross-Version Defect Prediction
Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim
http://doi.org/10.5626/JOK.2023.50.6.460
Software defect prediction is a helpful technique for effective testing resource allocation. Software cross-version defect prediction reflects the environment in which the software is developed in a continuous version, with software modules added or deleted through a version update process. Repetition of this process can cause differences in data distribution between versions, which can negatively affect defect prediction performance. Deep domain adaptation(DeepDA) techniques are methods used to reduce distribution difference between sources and target data in the field of computer vision. This paper aims to reduce difference in data distribution between versions using various DeepDA techniques and to identify techniques with the best defect prediction performance. We compared performance between deep domain adaptation techniques (i.e., Domain-Adversarial Neural Network (DANN), Adversarial Discriminator Domain Apaptation (ADDA), and Wasserstein Distance Guided Representation Learning (WDGRL)) and identified performance differences according to the pair of source data. We also checked performance difference according to the ratio of target data used in the learning process and performance difference in terms of hyperparameter setting of the DANN model. Experimental results showed that DANN was more suitable for cross-version defect prediction environments. The DANN model performed the best when using all previous versions of data except the target version as a source. In particular, it showed the best performance when setting the number of hidden layers of the DANN model to 3. In addition, when applying the DeepDA technique, the more target data used in the learning process, the better the performance. This study suggests that various DeepDA techniques can be used to predict software cross-version defects in the future.
The Dataset and a Pretrained Language Model for Sentence Classification in Korean Science and Technology Abstracts
Hongbi Ahn, Soyoung Park, Yuchul Jung
http://doi.org/10.5626/JOK.2023.50.6.468
Classifying each sentence according to its role or function is a critical task, particularly in science and technology papers where abstracts contain various types of research-related content. Proper content curation and appropriate meaning tags are necessary but challenging due to the complexity and diversity of the work. For instance, in biomedical-related abstract data (such as PubMed) in foreign languages, the sentences in the abstract typically follow a consistent semantic sequence, such as background-purpose-method-result-conclusion. However, in Korean paper abstracts, the sentences are described in different orders depending on the author. To address this, we have constructed a dataset (PubKorSci-1k) that tags each sentence according to its role in the abstracts of the science and technology domains described in Korean. Additionally, we propose a learning technique for sentence classification based on this dataset.
Video Retrieval System Using One-to-One Relation Between Clip-Sentence Sequence
http://doi.org/10.5626/JOK.2023.50.6.476
Video retrieval is a research field that finds videos related to text queries among candidate videos. The previous studies on video retrieval studies have used the learning methods that enforce the embeddings of a text and its paried video to be similar to each other without considering the structures of video and text. In this paper, we propose a novel video retrieval model and a training technique that focus on a pair of a clip sequence and a sentence sequence with a one-to-one relationship. Experimental results show that the performance of the proposed model is improved by 0.3%p in R@1 for sentence-clip retrieval and 5.4%p R@1 for paragraph-video retrieval on YouCook2 datasets compared to baseline models.
An Ontology-based Description Approach to Temporal Transitions of Events
Je-Min Kim, Young-Tack Park, Sang-Min Kim, Yukyung Shin
http://doi.org/10.5626/JOK.2023.50.6.484
It is very useful to describe the information that changes between events and entities that occur over time in a standardized form. Various studies have used ontology to describe the time information of events for this purpose. The concept of lineage makes it possible to effectively describe the state transition and connectivity of events over time. In this paper, we propose a method of utilizing ontology-based event lineage and entity lineage to describe the time-dependent transformation process of events and entities in a standardized form. First, the time format was classified into instant, interval, duration, and periodic and expressed as an ontology instance, and each event had a one-time format. Then, the event and entity information expressed in the ontology were described in lineage. The response time was improved by 15.02% in an experiment on processing temporal relation queries to verify the relevance and usefulness of this study using Allen’s Temporal Relation.
Korean Coreference Resolution through BERT Embedding at the Morpheme Level
Kyeongbin Jo, Yohan Choi, Changki Lee, Jihee Ryu, Joonho Lim
http://doi.org/10.5626/JOK.2023.50.6.495
Coreference resolution is a natural language processing task that identifies mentions that are subject to coreference resolution in a given document, and finds and groups the mentions that refer to the same entity. Korean coreference resolution has been mainly studied in an end-to-end method, and for this purpose, all spans must be considered as potential mentions, so memory usage and time complexity increase. In this paper, a word-level coreference resolution model that performs coreference resolution by mapping sub-tokens back to word units was applied to Korean, and the token expression of the word-level coreference resolution model is calculated through CorefBERT to reflect Korean characteristics. After that, entity name and dependency parsing features were added. As a result of the experiment, in the ETRI Q&A domain evaluation set, F1 was 70.68%, showing a 1.67% performance improvement compared to the existing end-to-end cross-reference solving model, Memory usage improved by 2.4 times, and speed increased by 1.82 times.
PGB: Permutation and Grouping for BERT Pruning
http://doi.org/10.5626/JOK.2023.50.6.503
Recently, pre-trained Transformer-based models have been actively used for various artificial intelligence tasks, such as natural language processing and image recognition. However, these models have billions of parameters, which require significant computation for inference, and may be subject to many limitations for use in resource-limited environments. To address this problem, we propose PGB(Permutation Grouped BERT pruning), a new group-based structured pruning method for Transformer models. PGB effectively finds a way to change the optimal attention order according to resource constraints, and prunes unnecessary heads based on the importance of the heads to minimize the information loss in the model. Through various comparison experiments, PGB shows better performance in terms of inference speed and accuracy loss than the other existing structured pruning methods for the pre-trained BERT model.
Learning with Noisy Labels using Sample Selection based on Language-Image Pre-trained Model
Bonggeon Cha, Minjin Choi, Jongwuk Lee
http://doi.org/10.5626/JOK.2023.50.6.511
Deep neural networks have significantly degraded generalization performance when learning with noisy labels. To address this problem, previous studies observed that the model learns clean samples first in the early learning stage, and based on this, sample selection methods that selectively train data by considering small-loss samples as clean samples have been used to improve performance. However, when noisy labels are similar to their ground truth(e.g., seal vs. otter), sample selection is not effective because the model learns noisy data in the early learning stage. In this paper, we propose a Sample selection with Language-Image Pre-trained model (SLIP) which effectively distinguishes and learns clean samples without the early learning stage by leveraging zero-shot predictions from a pre-trained language-image model. Our proposed method shows up to 18.45%p improved performance over previously proposed methods on CIFAR-10, CIFAR-100, and WebVision.
An Automatic Framework for Nested Normalization and Table Migration of Large-Scale Hierarchical Data
Dasol Kim, Myeong-Seon Gil, Heesun Won, Yang-Sae Moon
http://doi.org/10.5626/JOK.2023.50.6.521
In the open data portal, a lot of data is distributed in the hierarchical structure of JSON and XML formats, and the scale is very large. Such hierarchical data includes several nestings because of its structural characteristics. As a result, nested table normalization and scale limitation problems can occur, which limits the utilization of large-scale open data. In this paper, we adopt Airbyte, an open-source ELT platform, for table migration of hierarchical files, and propose a new framework for automating table migration. This is the first study to report Airbyte’s nested JSON handling issue and contribute to solving the issue. Through extensive evaluation of the proposed framework for actual US data portals, we show that it operates normally even for structures that include multiple nestings, and it can process large-scale migration of 1.6K or more by providing automated processing logic. These results mean that the proposed framework is a very practical one that supports the nested normalization of hierarchical data and provides a reliable large-scale migration function.
Open-source-based 5G Access Network Security Vulnerability Automated Verification Framework
Jewon Jung, Jaemin Shin, Sugi Lee, Yusung Kim
http://doi.org/10.5626/JOK.2023.50.6.531
Recently, various open sources based on 5G standards have emerged, and are widely used in research to find 5G control plane security vulnerabilities. However, leveraging those open sources requires extensive knowledge of complex source code, wireless communication devices, and massive 5G security standards. Therefore, in this paper, we propose a framework for the automatic verification of security vulnerabilities in the 5G control plane. This framework builds a 5G network using commercial Software Defined Radio (SDR) equipment and open-source software and implements a Man-in-the-Middle (MitM) attacker to deploy a control plane attack test bed. It also implements control plane message decoding and correction modules to execute message spoofing attacks and automatically classifies security vulnerabilities in 5G networks. In addition, a GUI-based web user interface is implemented so that users can create MitM attack scenarios and check the verification results themselves.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr