Neural Networks using Opcode Frequency to Identify Combinations of Obfuscation Techniques

Youjeong Noh, Jeongwoo Kim, Eun-Sun Cho

http://doi.org/10.5626/JOK.2024.51.4.293

The outcome of deobfuscation with the aim of elucidating to understand the structure of malware is highly dependent on the analyst"s capabilities, as it requires the use of multiple heuristics. Researchers have proposed various methods for automated analysis to detect which obfuscation techniques have been applied to programs. However, the existing works have reasoned about obfuscation through classification methods, which do not consider sequential code transformations caused by obfuscation or the fact that multiple categories of obfuscation can be applied to a program independently. The current paper therefore proposes a multi-label classification model for obfuscation type detection and a model for inferring the last obfuscation type when multiple obfuscations have been applied. We implemented a deep learning-based obfuscation type detection model using the opcode frequency of instructions frequency for O-LLVM (Obfuscator-LLVM)[8] obfuscation, and our proposed model was shown to achieve high performance obfuscation detection.

Number-based High Fidelity Logical Qubit on the Surface Code FTQC

Seungju An, Byung-Soo Choi

http://doi.org/10.5626/JOK.2024.51.4.301

To achieve the quantum advantage, the quantum computing size should be increased. However, the quantum computational power cannot be easily increased because of the high error rates on qubits and gates. To overcome such problem, the surface code based on the fault-tolerant quantum computation model has been investigated a lot since it works with relatively higher error rates in theory. However, In practice, we need many improvements on the surface code such as the requirement of a large number of physical qubits. Therefore, in this work, we propose a logical qubit design method, which exploits the multiple lower level qubits unlike the conventional bigger-sized qubit design method. This method uses the concept of the block-code scheme. The analysis result shows that the proposed method achieves a lower error rate than the bigger-sized logical qubits with the same number of physical qubits. In conclusion, we believe this approach can improve the resource efficiency of the surface code FTQC.

KMSS: Korean Media Script Dataset for Dialogue Summarization

Bong-Su Kim, Ji-Yoon Kim, Seung-ho Choi, Hyun-Kyu Jeon, Hye-Jin Jun, Hye-In Jung, Jung-Hoon Jang

http://doi.org/10.5626/JOK.2024.51.4.311

Dialogue summarization involves extracting or generating key contents from multi-turn documents consisting of utterances by multiple speakers. Dialogue summarization models are beneficial in analyzing content and service records for recommendations in conversation systems. However, there are no Korean dialogue summarization datasets necessary for model construction. This paper proposes a dataset for generative-based dialogue summarization. Source data were collected from the large-capacity contents of domestic broadcasters, and annotators manually labeled them. The dataset comprises approximately 100,000 entries across 6 categories, with summary sentences annotated as single sentences, three sentences, or two-and-a-half sentences. Additionally, this paper introduces a dialogue summary labeling guide to internalize and control data characteristics. It also presents a method for selecting a decoding model structure for model suitability verification. Through experiments, we highlight some characteristics of the constructed data and present benchmark performances for future research.

SCA: Improving Document Grounded Response Generation based on Supervised Cross-Attention

Hyeongjun Choi, Seung-Hoon Na, Beomseok Hong, Youngsub Han, Byoung-Ki Jeon

http://doi.org/10.5626/JOK.2024.51.4.326

Document-grounded response generation is the task of aiming at generating conversational responses by “grounding” the factual evidence on task-specific domain, such as consumer consultation or insurance planning, where the evidence is obtained from the retrieved relevant documents in response to a user’s question under the current dialogue context. In this study, we propose supervised cross-attention (SCA) to enhance the ability of the response generation model to find and incorporate “response-salient snippets” (i.e., spans or contents), which are parts of the retrieved document that should be included and maintained in the actual answer generation. SCA utilizes the additional supervised loss that focuses cross-attention weights on the response-salient snippets, and this attention supervision likely enables a decoder to effectively generate a response in a “saliency-grounding” manner, by strongly attending to the important parts in the retrieved document. Experiment results on MultiDoc2Dial show that the use of SCA and additional performance improvement methods leads to the increase of 1.13 in F1 metric over the existing SOTA, and reveals that SCA leads to the increase of 0.25 in F1.

A Deep Learning Model for Fire Anomaly Detection in Underground Utility Tunnel based on ConvLSTM Variational AutoEncoder

Joseph Ahn, Hyo-gun Yoon

http://doi.org/10.5626/JOK.2024.51.4.333

As the failure of fire detection not only leads to an escalation in disaster management costs but also inflicts significant damages and disruptions to citizens" lives and industries, accurate detection of fire anomalies is of paramount importance. There have been several studies on monitoring and managing catastrophic events using AI, IoT and digital twin technologies. However, the challenges arise from the telecommunications environment and the level of sensor maintenance, making it difficult for IoT sensors to collect data without experiencing loss or noise. This paper proposes a hybrid deep learning model called ConvLSTM-VAE that can detect anomalies by considering spatial and temporal information simultaneously, demonstrating robust results even in the presence of noise or data loss. A virtual environment modeled after the underground utility tunnel located in Ochang, Chungcheongbuk-do is constructed to collect fire data using Fire Dynamics Simulator (FDS) software. In the experiment we compared the proposed model to other time-series anomaly detection models and evalutated its predictive performance. The results show that the precision, recall, accuracy, and F1-score of ConvLSTM-VAE are 0.881579, 0.99505, 0.930693, and 0.934884, respectively, and far superior to other models in terms of its predictive performance.

An Image Harmonization Method with Improved Visual Uniformity of Composite Images in Various Lighting Colors

Doyeon Kim, Jonghwa Shim, Hyeonwoo Kim, Changsu Kim, Eenjun Hwang

http://doi.org/10.5626/JOK.2024.51.4.345

Image composition is a technique that creates a composite image by arranging foreground objects extracted from other images onto a background image. To improve the visual uniformity of the composite images, deep learning-based image harmonization techniques that adjust the lighting and color of foreground objects to match the background image have been actively proposed recently. However, existing techniques have limited performance in visual uniformity because they adjust colors only for the lighting color distribution of the dataset used for training. To address this problem, we propose a novel image harmonization scheme that has robust performance for various lighting colors. First, iHColor, a new dataset composed of various lighting color distributions, is built through data preprocessing. Then, a pre-trained GAN-based Harmonization model is fine-tuned using the iHColor dataset. Through experiments, we demonstrate that the proposed scheme can generate harmonized images with better visual uniformity than existing models for various lighting colors.

SASRec vs. BERT4Rec: Performance Analysis of Transformer-based Sequential Recommendation Models

Hye-young Kim, Mincheol Yoon, Jongwuk Lee

http://doi.org/10.5626/JOK.2024.51.4.352

Sequential recommender systems extract interests from user logs and use them to recommend items the user might like next. SASRec and BERT4Rec are widely used as representative sequential recommendation models. Existing studies have utilized these two models as baselines in various studies, but their performance is not consistent due to differences in experimental environments. This research compares and analyzes the performance of SASRec and BERT4Rec on six representative sequential recommendation datasets. The experimental result shows that the number of user-item interactions has the largest impact on BERT4Rec training, which in turn leads to the performance difference between the two models. Furthermore, this research finds that the two learning methods, which are widely utilized in sequential recommendation settings, can also have different effects depending on the popularity bias and sequence length. This shows that considering dataset characteristics is essential for improving recommendation performance.

An Automated Error Detection Method for Speech Transcription Corpora Based on Speech Recognition and Language Models

Jeongpil Lee, Jeehyun Lee, Yerin Choi, Jaehoo Jang, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2024.51.4.362

This research proposes a "machine-in-the-loop" approach for automatic error detection in Korean speech corpora by integrating the knowledge of CTC-based speech recognition models and language models. We experimentally validated its error detection performance through a three-step procedure that leveraged Character Error Rate (CER) from the speech recognition model and Perplexity (PPL) from the language model to identify potential transcription error candidates and verify their text labels. This research focused on the Korean speech corpus, KsponSpeech, resulting in a reduction of the character error rate on the test set from 9.44% to 8.9%. Notably, this performance enhancement was achieved even when inspecting only approximately 11% of the test data, highlighting the higher efficiency of our proposed method than a comprehensive manual inspection process. Our study affirms the potential of this efficient "machine-in-the-loop" approach for a cost-effective error detection mechanism in speech data while ensuring accuracy.

Improving Portfolio Optimization Performance based on Reinforcement Learning through Episode Randomization and Action Noise

Saehyeong Woo, Doguk Kim

http://doi.org/10.5626/JOK.2024.51.4.370

Portfolio optimization is essential to reduce investment management risk and maximize returns. With the rapid development of artificial intelligence technology in recent years, research is being conducted to utilize it in various fields, and in particular, investigation on the application of reinforcement learning in the financial sector. However, most studies do not address the problem of agent overfitting due to iterative training on historical financial data. In this study, we propose a technique to mitigate overfitting through episode randomization and action noise in reinforcement learning-based portfolio optimization. The proposed technique randomizes the duration of the training data in each episode to experience different market conditions, thus promoting the effectiveness of data augmentation and exploration by leveraging action noise techniques to allow the agent to respond to specific situations. Experimental results show that the proposed technique improves the performance of the existing reinforcement learning agent, and comparative experiments confirm that both techniques contribute to performance improvement under various conditions.

A Differential-Privacy Technique for Publishing Density-based Clustering Results

Namil Kim, Incheol Baek, Hyubjin Lee, Minsoo Kim, Yon Dohn Chung

http://doi.org/10.5626/JOK.2024.51.4.380

Clustering techniques group data with similar characteristics. Density-Based Spatial Clustering Analysis (DBSCAN) is widely used in various fields as it can detect outliers and is not affected by data distribution. However, the conventional DBSCAN method has a vulnerability where privacy-sensitive personal information in the original data can be easily exposed in the clustering results. Therefore, disclosing and distributing such data without appropriate privacy protection poses risks. This paper proposes a method to generate DBSCAN results that satisfy differential privacy. Additionally, a post-processing technique is introduced to effectively reduce noise introduced during the application of differential privacy and to process the data for future analysis. Through experiments, we observed that the proposed method enhances the utility of the data while satisfying differential privacy.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr