Quantitative Analysis of Sequence-based Container Security Enhancement using a System Call Sequence Extraction Framework

Somin Song, Youyang Kim, Byungchul Tak

http://doi.org/10.5626/JOK.2023.50.11.913

Container escape is one of the most critical threats in containerized applications that share a host kernel. Attackers exploit kernel vulnerabilities through a series of manipulated system calls to achieve privilege escalation, which can lead to container escape. Seccomp is a security mechanism widely used in containers. It strengthens the level of isolation by filtering out unnecessary system call invocations. However, the filtering mechanism of Seccomp that blocks individual system calls has a fundamental limitation in that it can be vulnerable to attacks that use system calls allowed by the policy. Therefore, this study presents a hybrid analysis framework that combines static and dynamic analyses to extract system call sequences from exploit codes. Using this framework, we compared the security strength of an existing individual system call-based filtering mechanism and proposed a system call sequence-based filtering mechanism in terms of the number of blockable exploit codes using system call profiles for the same exploit codes. As a result, the proposed system call sequence-based filtering mechanism was able to increase the defense coverage from 63% to 98% compared to the existing individual system call-based filtering mechanism.

Integrated Host-SSD Mapping Table Cache Management Techniques for Improving Performance of a Mobile Storage Device

Yoona Kim, Inhyuk Choi, Sungjin Lee, Jihong Kim

http://doi.org/10.5626/JOK.2023.50.11.924

As the size of a storage device gradually increases, the demand for on-device memory capacity required for managing the address mapping translation of a NAND flash-based storage device increases. The on-device memory capacity of a mobile storage device, Universal Flash Storage (UFS), does not increase due to H/W and cost constraints, making it challenging to manage the increased address translation table. To resolve the problem, Host Performance Booster (HPB), which borrows host-side DRAM memory to load portions of the address translation table was introduced. In this paper, we demonstrate that the HPB-enabled system does not work in an integrated manner with the device-side SRAM, therefore wasting the given memory resource. We propose integrated mapping table management techniques that consider the distinctive features of each cache layer. By adopting these techniques, we aim to minimize wasted cache resources, reduce storage latency, and prevent unnecessary degradation of the storage lifetime. Based on the evaluation results, the cache hit ratio is improved by 5% while the wasted memory resource is reduced by 95%, and the number of device-side garbage collections is reduced by 43% compared to the baseline scheme.

Improvement Study on Active Learning-based Cross-Project Defect Prediction System

Taeyeun Yang, Hakjoo Oh

http://doi.org/10.5626/JOK.2023.50.11.931

This study proposes a practical improvement method for an active learning-based system for cross-project defect prediction. A previous study applied active learning tech- niques to practically improve the performance of cross-project defect prediction, but it used a traditional machine learning model that used hand-made features as input for active learning target selection and defect prediction, therefore feature extraction was expensive and performance was limited. In addition, the problem of performance deviation according to the selection of the input project remained. In this study, the following methods were proposed to overcome these limitations. First, we used a deep learning model that can use the source code as an input to lower the model building cost and improve prediction performance. Second, a Bayesian convolutional neural network is applied to select an active learning target using a deep learning model. Third, instead of considering a single source project, we applied a method that automatically extracts a training data set from multiple projects. Applying the system proposed in this study to 7 open source projects improved the average prediction performance by 13.58% compared to the previous latest research.

Knowledge-based Supporting Facts Generation Model for Question and Answer

Sujin Seong, Jeongwon Cha

http://doi.org/10.5626/JOK.2023.50.11.940

In this study, we intend to create supporting facts from the knowledge base to add information to the question and answer process, and provide a form that is easy for humans to read. Data from two knowledge bases, DBpedia and Wikidata, related to supporting documents in HotpotQA were collected through crawling, and the supporting facts generators were trained using collected triples. The answer generator was trained with generated supporting facts and questions as inputs. Regardless of both DBpedia and Wikidata, supporting facts generated based on the knowledge base improved answer generation performance by providing positive additional information about questions, and generated human-understandable sentences.

ECG Arrhythmia Classification Model with VAE-based Data Augmentation and CNN

Jinhee Kwak, Jaehee Jung

http://doi.org/10.5626/JOK.2023.50.11.947

Due to its convenient accessibility, and crucial importance in arrhythmia diagnosis, ECG data is often considered in predicting heart disease. The MIT-BIH Arrhythmia dataset, which is widely utilized in research focused on arrhythmia analysis, is one of the contributing factors to heart disease. However, the dataset exhibits imbalanced arrhythmia classes due to variations in incidence rate. These imbalanced arrhythmia classes affect the performance of arrhythmia classification. To solve the imbalanced problem, this paper presents four distinct classification methods that utilize augmented data. These different augmentation techniques were compared and assessed alongside the VAE method in terms of classification performance. Furthermore, the CNN and the CNN-LSTM models were compared and analyzed in the context of the classification model. In conclusion, by applying VAE augmentation to train the balanced data and classifying the arrhythmia using the CNN, we achieved an accuracy of 98.9%. These results confirm the superior effectiveness of the proposed model compared to other existing arrhythmia classification models, particularly in terms of the sensitivity.

Applying Multitopic Analysis of Bug Reports and CNN algorithm to Bug Severity Prediction

Eontae Kim, Geunseok Yang, Inhong Jung

http://doi.org/10.5626/JOK.2023.50.11.954

Bugs are common in software development. Depending on the severity of bugs, they can be classified as major errors and minor errors. In addition, the severity of the bug can be selected by the bug reporter. However, the bug reporter could apply subjective judgment, which can lead to errors in the severity judgment. To resolve this problem, in this study, we predict the bug severity by applying topic-based Severe and Non-Severe extraction with convolutional neural network (CNN) learning. First, by using the properties of the bug report, is the predicting process is divided into Global topic, Product topic, Component topic and Priority topic and the bug reports are extracted from each topic based on Severe and Non-Severe. The Severe and Non-Severe features are extracted from the Global topics, and severity features are extracted from the Product, Component and Priority topics in the same way. The extracted features are combined, put into the CNN algorithm as an input layer, and the model is trained. To evaluate the efficiency of our model, a comparison between the proposed model and the baselines were conducted in the Eclipse, Mozilla, Apache and KDE open-source projects. Our model showed an improved performance. The results showed 97% for Eclipse, 96% for Mozilla, 95% for Apache and 99% for KDE, showing an average performance improvement of about 24.59% compared to the baseline, and a statistically significant difference.

Extreme Environment Rotated Object Detection Network

Giljun Lee, Junyaup Kim, Gwanghan Lee, Simon S. Woo

http://doi.org/10.5626/JOK.2023.50.11.966

With the advancement of object detection models, it is possible to efficiently infer synthetic aperture radar (SAR) and electro-optical (EO) satellite images. However, conventional object detection models using horizontal bounding boxes (HBB) struggle to detect small and densely grouped objects in satellite images. To address this issue, this paper proposes E^2RDet. This algorithm effectively modifies the structure of the Yolov7 object detection model, enabling it to accurately detect objects represented by oriented bounding boxes (OBB) in SAR images. This algorithm improves the object detection model architecture and loss function to facilitate learning of an object"s dynamic (orientation) posture. Using various training datasets, E^2RDet demonstrates performance improvements across three benchmark SAR datasets. This indicates that existing HBB object detection models can train and perform object detection on objects represented by OBBs.

New Transformer Model to Generate Molecules for Drug Discovery

Yu-Bin Hong, Kyungjun Lee, DongNyenog Heo, Heeyoul Choi

http://doi.org/10.5626/JOK.2023.50.11.976

Among various generative models, recurrent neural networks (RNNs) based models have achieved state-of-the-art performance in the drug generation task. To overcome the long-term dependency problem that RNNs suffer from, Transformer-based models were proposed for the task. However, the Transformer models showed worse performances than the RNNs models in the drug generation task, and we believe it was because the Transformer models were over-parameterized with the over-fitting problem. To avoid the problem, in this paper, we propose a new Transformer model by replacing the large decoder with simple feed-forward layers. Experiments confirmed that our proposed model outperformed the previous state-of-the-art baseline in major evaluation metrics while preserving other minor metrics with a similar level of performance. Furthermore, when we applied our model to generate candidate molecules against SARs-CoV-2 (COVID-19) virus, the generated molecules were more effective than drugs in commercial market such as Paxlovid, Molnupiravir, and Remdesivir.

Online Opinion Fraud Detection Using Graph Neural Network

Woochang Hyun, Insoo Lee, Bongwon Suh

http://doi.org/10.5626/JOK.2023.50.11.985

This study proposed a graph neural network model to detect opinion frauds that undermine the of information and hinder users" decision-making on online platforms. The proposed method uses methods on a graph of relationships between online reviews to produce relational representations, are then combined with the characteristics of the center nodes to predict fraud. Experimental results on a real-world dataset demonstrate that this approach is more accurate and faster than existing state-of-art methods, while also providing interpretability for key relations. With the help of this study, practitioners will be able to utilize the analytical results in decision-making and overcome the general drawback of neural network-based models" lack of explainability.

A Study on Compliance of Data and Control Coupling of Weapon System Software Airworthiness Certification

Sunyoung Shin

http://doi.org/10.5626/JOK.2023.50.11.995

In 2009, To secure flight safety of military aircraft and enhance the competitiveness of our aircraft exports by applying internationally recognized airworthiness certification standards, South Korea established the "Act on Certification of Flight Safety for Military Aircraft" along with its enforcement decree and regulations. According to these regulations, domestically developed military aircraft are required to be certified following the airworthiness certification laws, procedures, and standards. The standard airworthiness certification criteria for military aircraft, which serve as the basis for airworthiness assessment, was developed by the Defense Acquisition Program Administration (DAPA) and has been revised up to the 7th edition. Among the recent changes, the most impactful area is Chapter 15, which pertains to computer resources in software. As the proportion of software development within weapon systems continues to increase, the related standards are becoming more detailed and refined to keep up with the evolving. This study aimed to clarify and propose verification methods for newly incorporated software coupling criteria in the revised airworthiness certification standards.

Addressing Write-Warm Pages in OLTP Workloads

Kyong-Shik Lee, Mijin An, Sang-Won Lee

http://doi.org/10.5626/JOK.2023.50.11.1002

One of the most important purposes of buffer management policies is to cache frequently accessed data in the buffer pool to minimize disk I/O. However, even if frequently referenced pages are effectively stored, a small number of pages can still result in excessive disk I.O. This is because of write-warm pages, which are repeatedly fetched and evicted from the buffer pool. In this paper, we introduce a “(Write-)Warm Page Thrashing” problem and confirm the existence of write-warm pages. Specifically, we found that 10% of flushed pages accounted for 41% of writes. This could degrade the performance, particularly for flash memory devices with slow write speeds. Therefore, a new buffer management policy is required to detect and prevent such thrashing problem.

Privacy-Preserving Data Publishing: Research on Trends in De-identification Techniques for Structured and Unstructured Data

Yongki Hong, Gihyuk Ko, Heedong Yang, Seung Hwan Ryu

http://doi.org/10.5626/JOK.2023.50.11.1008

The advent of AI has seen an increased demand for data for AI development, leading to a proliferation of data sharing and distribution. However, there is also the risk of personal information disclosure during data utilization and therefore, it is necessary to undergo a process of de-identification before distributing the data. Privacy-Preserving Data Publishing (PPDP) is a series of procedures aimed at adhering to specified privacy guidelines while maximizing the utility of data. It has been continuously researched and developed. Since the early 2000s, techniques for de-identifying structured data (e.g., tables or relational data) were studied. As a significant portion of the collected data is now unstructured data and its proportion is increasing, research on de-identification techniques for unstructured data is also actively being conducted. In this paper, we aim to introduce the existing de-identification techniques for structured data and discuss recent trends in de-identification techniques for unstructured data.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr