Search : [ author: 이현 ] (34)

Generating Counterfactual Examples through Generating Adversarial Examples

Hyungyu Lee, Dahuin Jung

http://doi.org/10.5626/JOK.2022.49.12.1132

The advance of artificial intelligence (AI) has brought numerous conveniences. However, the complex structure of AI models makes it challenging to understand the inner working of AI. Counterfactual explanation is a method using counterfactual examples, in which minimum perceptible perturbations are applied to change classification results, to explain AI. Adversarial examples are data modified for causing AI models to misclassify the data. Unlike counterfactual examples, perturbations applied to adversarial examples are difficult for humans to perceive. In a simple model, generating adversarial examples is similar to generating counterfactual examples. In contrast, it is different in deep learning because the cognitive difference between humans and deep learning models is often huge. Nevertheless, we confirmed that adversarial examples generated by certain deep learning models were similar to counterfactual examples. In this paper, we analyzed the structure and conditions of deep learning models in which adversarial examples were similar to counterfactual examples. We also proposed a new metric, partial concentrated change (PCC), and compared adversarial examples generated from different models using existing metrics and the proposed PCC.

An Automatic Parameter Optimizing Scheme for RocksDB

Jiwon Kim, Hyeonmyeong Lee, Sungmin Jung, Heeseung Jo

http://doi.org/10.5626/JOK.2021.48.11.1167

For users with low understanding of application, it is very difficult to optimize a complex application. Leading studies that optimize application using one or two parameters can enhance the performance of an application. However, it is difficult to consider the relationship between various parameters using a single parameter optimization. In this paper, we proposed two techniques, LDH-Force and PF-LDH, that could optimize several parameters at the same time. The LDH-Force technique could efficiently reduce the number of searches by adding an LDH process, while simultaneously finding the optimal parameter combination for several parameters. The PF-LDH technique could further reduce the search cost by adding a filtering process and confirming that the degree to which the parameter affects the performance is different. Evaluation results confirmed that the proposed scheme had performance improvement of up to 42.55 times. The proposed scheme was able to find the optimal parameter combination at the lowest search cost without user intervention under various workloads.

EFA-DTI: Prediction of Drug-Target Interactions Using Edge Feature Attention

Erkhembayar Jadamba, Sooheon Kim, Hyeonsu Lee, Hwajong Kim

http://doi.org/10.5626/JOK.2021.48.7.825

Drug discovery is a high-level field of research requiring the coordination of disciplines ranging from medicinal chemistry, systems biology, structural biology, and increasingly, artificial intelligence. In particular, drug-target interaction (DTI) prediction is central to the process of screening for and optimizing candidate substances to treat disease from a nearly infinite set of compounds. Recently, as computer performance has developed dramatically, studies using artificial intelligence neural networks have been actively conducted to reduce the cost and increase the efficiency of DTI prediction. This paper proposes a model that predicts an interaction value between a given molecule and protein using a learned molecule representation via Edge Feature Attention-applied Graph Net Embedding with Fixed Fingerprints and a protein representation using pre-trained protein embeddings. The paper describes architectures, experimental methods, and findings. The model demonstrated higher performance than DeepDTA and GraphDTA, which had previously demonstrated the best performance in DTI studies.

Performance Improvement of Neural Network-based Detection of ROP Attacks using Abstraction of Instruction Features

Hyungyu Lee, Changwoo Pyo

http://doi.org/10.5626/JOK.2021.48.5.493

Return-oriented programming (ROP) is a program attack technique that executes code snippets in memory following an attacker-intended order using return instructions. This paper proposes a method of detecting ROP attacks using neural networks. The method reduces the size of the data by using abstraction of instruction features relevant to ROP attacks rather than entire bits of instructions and activates the neural networks only for 12 instructions after a return instruction. Our experiments on a web server, browser, and the necessary libraries show speedups of 9.6 and 1,403.1 over DeepCheck and HeNet with an F1 score of 100.

An Evaluation Method for Generalization Errors of CNN using Training Data

Hyeon Ho Lee, Heung Seok Chae

http://doi.org/10.5626/JOK.2021.48.3.284

Even with high-performance CNNs, generalization errors, which are the errors on test datasets that are expected in the real world, are often high. This generalization error must be reduced so that the model can maintain its learned performance in the real world. This paper defines a response set as a neuron set that is frequently activated for each model class learned from the training dataset with high data diversity. Also, the differences in generalization errors due to the data diversity of the test dataset are considered. The difference is defined as a relative generalization error. In the current work, an evaluation method for CNN generalization error using only the training dataset is proposed by using the relationship between the CNN class response set and the relative generalization error. The case study confirms that the response set ratio is related to the relative generalization error and demonstrates the effectiveness of the evaluation method for generalization errors of CNN using training data.

Adjusting OS Scheduler Parameters to Improve Server Application Performance

Taehyun Han, Hyeonmyeong Lee, Heeseung Jo

http://doi.org/10.5626/JOK.2020.47.7.643

Modern Linux servers are used in a variety of ways, from large servers to small IOTs, and most machines run their services through the default scheduler provided by Linux. Although it is possible to optimize for a specific purpose, there is a problem in which the average user cannot optimize all modern Linux applications. In this paper, we propose SCHEDTUNE to automatically optimize the scheduler configuration to maximize Linux server performance. SCHEDTUNE allows users to improve performance without modification to the application or basic kernel source running on the server. This makes it easy for administrators to configure schedulers that operate specifically for their servers. Experimental results showed that when SCHEDTUNE is applied, the maximum performance is achieved up to 19 %, and in most cases performance improvement is achieved as well.

KorQuAD 2.0: Korean QA Dataset for Web Document Machine Comprehension

Youngmin Kim, Seungyoung Lim, Hyunjeong Lee, Soyoon Park, Myungji Kim

http://doi.org/10.5626/JOK.2020.47.6.577

KorQuAD 2.0 is a Korean question and answering dataset consisting of a total of 100,000+ pairs. There are three major differences from KorQuAD 1.0, which is the standard Korean Q & A data. The first is that a given document is a whole Wikipedia page, not just one or two paragraphs. Second, because the document also contains tables and lists, it is necessary to understand the document structured with HTML tags. Finally, the answer can be a long text covering not only word or phrase units, but paragraphs, tables, and lists. As a baseline model, BERT Multilingual is used, released by Google as an open source. It shows 46.0% F1 score, a very low score compared to 85.7% of the human F1 score. It indicates that this data is a challenging task. Additionally, we increased the performance by no-answer data augmentation. Through the distribution of this data, we intend to extend the limit of MRC that was limited to plain text to real world tasks of various lengths and formats.

Branchpoint Prediction Using Self-Attention Based Deep Neural Networks

Hyeonseok Lee, Sungchan Kim

http://doi.org/10.5626/JOK.2020.47.4.343

Splicing is a ribonucleic acid (RNA) process of creating a messenger RNA (mRNA) translated into proteins. Branchpoints are sequence elements of RNAs essential in splicing. This paper proposes a novel method for branchpoint prediction. Identification of branchpoints involves several challenges. Branchpoint sites are known to depend on several sequence patterns, called motifs. Also, a branchpoint distribution is highly biased, imposing a class-imbalanced problem. Existing approaches are limited in that they either rely on handcrafted sequential features or ignore the class imbalance. To address those difficulties, the proposed method incorporates 1) Attention mechanisms to learn sequence-positional long-term dependencies, and 2) Regularization with triplet loss to alleviate the class imbalance. Our method is comparable to the state-of-the-art performance while providing rich interpretability on its decisions.

Passage Re-ranking Method Based on Sentence Similarity Through Multitask Learning

Youngjin Jang, Hyeon-gu Lee, Jihyun Wang, Chunghee Lee, Harksoo Kim

http://doi.org/10.5626/JOK.2020.47.4.416

The machine reading comprehension(MRC) system is a question answering system in which a computer understands a given passage and respond questions. Recently, with the development of the deep neural network, research on the machine reading system has been actively conducted, and the open domain machine reading system that identifies the correct answer from the results of the information retrieval(IR) model rather than the given passage is in progress. However, if the IR model fails to identify a passage comprising the correct answer, the MRC system cannot respond to the question. That is, the performance of the open domain MRC system depends on the performance of the IR model. Thus, for an open domain MRC system to record high performance, a high performance IR model must be preceded. The previous IR model has been studied through query expansion and reranking. In this paper, we propose a re-ranking method using deep neural networks. The proposed model re-ranks the retrieval results (passages) through multi-task learning-based sentence similarity, and improves the performance by approximately 8% compared to the performance of the existing IR model with experimental results of 58,980 pairs of MRC data.

An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments

Dojin Choi, Songhee Park, Yeondong Kim, Jiwon Wee, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok, Jaesoo Yoo

http://doi.org/10.5626/JOK.2020.47.1.95

Content-based image retrieval that searches an object in images has been utilizing for criminal activity monitoring and object tracking in video. In this paper, we propose a high-dimensional indexing scheme based on distributed in-memory for the content-based image retrieval. It provides similarity search by using massive feature vectors extracted from images or objects. In order to process a large amount of data, we utilized a big data platform called Spark. Moreover, we employed a master/slave model for efficient distributed query processing allocation. The master distributes data and queries. and the slaves index and process them. To solve k-NN query processing performance problems in the existing distributed high-dimension indexing schemes, we propose optimization methods for the k-NN query processing considering density and search costs. We conduct various performance evaluations to demonstrate the superiority of the proposed scheme.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr