New Adaptive Matching Order and Performance Comparison for Subgraph Matching Problem

Seunghwan Min, Wonseok Shin, Chaewon Kim, Kunsoo Park

http://doi.org/10.5626/JOK.2022.49.1.1

In recent years, graph analysis has been used in various applications. One of the fundamental problems in graph analysis is the subgraph matching problem. Given a data graph and a query graph, the subgraph matching problem is to find all embeddings of the query graph in the data graph. Many backtracking-based algorithms have been studied to solve this problem. In this paper, we analyzed the problems in adaptive matching order proposed by DAF, a state-of-the-art algorithm that solves this problem, and introduced an improved adaptive matching order. Furthermore, we conducted experiments with real data graphs to demonstrate that the proposed matching order was more effective than the previous matching orders if the pruning technique was not used or the elapsed time was not very short even if the pruning technique was used.

Design of Durable Node Replication for Persistent Memory Data Structures on NUMA Architectures

Junghan Kim, Young Ik Eom

http://doi.org/10.5626/JOK.2022.49.1.8

Recently, advances in persistent memory and NUMA technologies have allowed for the provision of high performance and large storage space to the applications such as big data and machine learning. Such PM environments on multi-node systems require a change in the data structures, which are being used in each layer of the software stack. In terms of the research on PM data structures, however, it is a difficult problem to ensure high level of concurrency as well as non-volatility which is an important characteristics of NUMA and PM, respectively. In this paper, we propose an NRPM that extends the node replication, which is a representative of NUMA algorithms. NRPM outperforms hash algorithm by up to 5x by improving concurrency in the multi-node PM server using shared-log and flat combining methods. We confirmed the validity of NRPM through various performance analyses considering the characteristics of NUMA-PM.

Survey on Recent Cryogenic Computing Research

Gyuhyeon Lee, Ilkwon Byun, Dongmoon Min, Jangwoo Kim

http://doi.org/10.5626/JOK.2022.49.1.15

Cryogenic computing is gaining growing attention due to the recent slow development of conventional computing. Owing to the favorable effects of low temperatures, cryogenic computing can achieve higher performance and power efficiency than room-temperature computing. However, cryogenic computing incurs an expensive cooling cost, which should be overcome when implementing a superior cryogenic computer. Therefore, the cryogenic computer should be designed to maximize the benefits of the cryogenic temperature. In this paper, we survey the recent research studies on cryogenic computing. First, we introduce research studies on the semiconductor at cryogenic temperatures. Then, we summarize the research that models various computing elements (e.g., DRAM, CPU, Flash) at cryogenic temperature, and propose their cryogenic-optimal designs. Finally, we point out the limitations of the previous cryogenic computing research studies and propose future directions.

Korean Text Summarization using MASS with Copying and Coverage Mechanism and Length Embedding

Youngjun Jung, Changki Lee, Wooyoung Go, Hanjun Yoon

http://doi.org/10.5626/JOK.2022.49.1.25

Text summarization is a technology that generates a summary including important and essential information from a given document, and an end-to-end abstractive summarization model using a sequence-to-sequence model is mainly studied. Recently, a transfer learning method that performs fine-tuning using a pre-training model based on large-scale monolingual data has been actively studied in the field of natural language processing. In this paper, we applied the copying mechanism method to the MASS model, conducted pre-training for Korean language generation, and then applied it to Korean text summarization. In addition, coverage mechanism and length embedding were additionally applied to improve the summarization model. As a result of the experiment, it was shown that the Korean text summarization model, which applied the copying and coverage mechanism method to the MASS model, showed a higher performance than the existing models, and that the length of the summary could be adjusted through length embedding.

Deletion-based Korean Sentence Compression using Graph Neural Networks

Gyoung-Ho Lee, Yo-Han Park, Kong Joo Lee

http://doi.org/10.5626/JOK.2022.49.1.32

Automatic sentence compression aims at generating a concise sentence from a lengthy source sentence. Most common approaches to sentence compression is deletion-based compression. In this paper, we implement deletion-based sentence compression systems based on a binary classifier and long short-term memory (LSTM) networks with attention layers. The binary classifier, which is a baseline model, classifies words in a sentence into words that need to be deleted and words that will remain in a compressed sentence. We also introduce a graph neural network (GNN) in order to employ dependency tree structures when compressing a sentence. A dependency tree is encoded by a graph convolutional network (GCN), one of the most common GNNs, and every node in the encoded tree is input into the sentence compression module. As a conventional GCN deals with only undirected graphs, we propose a directed graph convolutional network (D-GCN) to differentiate between parent and child nodes of a dependency tree in sentence compression. Experimental results show that the baseline model is improved in terms of the sentence compression accuracy when employing a GNN. Regarding the performance comparison of graph networks, a D-GCN achieves higher F1 scores than a GCN when applied to sentence compression. Through experiments, it is confirmed that better performance can be achieved for sentence compression when the dependency syntax tree structure is explicitly reflected.

Method for the Automatic Generation of Training Sets for Word Embedding Reflecting Sentiment Information

Dahee Lee, Won-Min Lee, Byung-Won On

http://doi.org/10.5626/JOK.2022.49.1.42

Word embedding is a method of expressing a word as a vector. However, since existing word embedding methods predict words that appear together, they are expressed as similar vectors even if they have different emotions. When building a sentiment analysis model using this, sentences with similar patterns may be classified into the same polarity, which is one of the factors that degrade the performance of the emotional analysis model. In this paper, to address the problem, we proposed the automatic generation of a training set for word embedding reflecting sentiment information using morpheme analysis, dependence parsing, and a sentiment dictionary. Using sentiment-specific word embedding vectors generated by the proposed model, we showed that the proposed sentiment-specific word embedding model outperformed the existing word embedding models including CBOW, Skip-Gram, FastText, ELMo, and BERT.

Identification of Generative Adversarial Network Models Suitable for Software Defect Prediction

Jiwon Choi, Jaewook Lee, Duksan Ryu, Suntae Kim

http://doi.org/10.5626/JOK.2022.49.1.52

Software Defect Prediction(SDP) helps effectively allocate quality assurance resources which are limited by identifying modules that are likely to cause defects. Software defect data suffer from class imbalance problems in which there are more non-defective instances than defective instances. In most machine learning methods, the defect prediction performance is degraded when there is a disproportionate number of instances belonging to a particular class. Therefore, this research aimed to solve the class imbalance problem and improve defect prediction performance by using a Generative Adversarial Network(GAN) model. To this end, we compared different kinds of GAN models for their suitability for SDP and checked the applicability of GAN models that were not applied in the related work. In our study, Vanilla-GAN(GAN), Conditional GAN (cGAN), and Wasserstein GAN (WGAN) models which were initially proposed for image generation were adapted for software defect prediction. Then those modified models were compared with Tabular GAN(TGAN) and Modeling Tabular data using Conditional GAN(CTGAN). Our experimental results showed that the CTGAN model is suitable for SDP data. We also conducted a sensitivity analysis examining which hyper-parameter values of CTGAN increase the recall rate and lower the probability of false alarm (PF). Our experimental results indicated that the hyper-parameters should be adjusted according to the dataset. We expect that our proposed approach can help effectively allocate limited resources by improving the performance of SDP.

Alleviation of Generic Responses by Adjusting N-gram Usage in Neural Chit-chat Dialogue Systems

JaeYoung Oh, WonKee Lee, Jeesoo Bang, Jaehun Shin, Jong-Hyeok Lee

http://doi.org/10.5626/JOK.2022.49.1.60

Chit-chat dialogue systems, the systems for unstructured conversations between humans and computer, aim to generate meaningful and diverse responses. However, training methods based on the maximum likelihood estimation have been reported to generate too many generic responses by the model; thus, reducing the interest in these systems. Recently, a new training method using unlikelihood training was proposed to generate diverse responses by penalizing the overuse of each vocab. However, it has a limitation that it only considers the usage of a token when penalizing each word, and does not consider in what context each token is used. Therefore, we propose a method by extending this work, which is penalizing the overuse of each n-gram. This method has the advantage of using information about the surrounding context in n-gram to penalize each token.

Study on the Establishment of Risk Assessment Criteria for the Dynamic Reliability Test of Performance Improvement Weapon System using Recycling Software

Hunyong Shin, Young-Soo Choi, Min-Ho Park, Hyung-suk Kim, Dae-San Oh, Jong-Kyu Kim, Jong-Geun Kim, Sungshin Kim

http://doi.org/10.5626/JOK.2022.49.1.67

Recently, as the proportion of software in weapon systems has increased and software reliability has an important influence on weapon systems, the importance of weapon system software reliability testing is being emphasized. In existing studies, reliability test standards are presented with a focus on new R&D software. However, recent studies on reliability test subjects and criteria are inadequate for R&D of the performance improvement weapon system software. In particular, establishment of standards and measures is required because it highly influences not only project schedule, but also the cost required for testing, depending on the test subject and standards. In this study, in order to solve this problem, the risk evaluation criteria for selecting the dynamic reliability test target of the recycled software in the performance improvement weapon system were proposed. The risk evaluation criteria were separated by dividing the developed software in the performance improvement weapon system into a new function and a modified function. Also, different criteria were established for complexity and correlation according to the function characteristics. Using this risk evaluation standard, risk evaluation was performed on the Korea Variable Message Format (KVMF) tactical computer performance improvement software, and dynamic reliability test subjects were selected. The efficiency of the dynamic reliability test was confirmed by comparing the risk assessment results and the dynamic test results evaluated by the proposed method with other newly developed projects.

Improving False Positive Rate of Extended Learned Bloom Filters Using Grid Search

Soohyun Yang, Hyungjoo Kim

http://doi.org/10.5626/JOK.2022.49.1.78

Bloom filter is a data structure that represents a set and returns whether data is included or not. However, there are cases in which false positives are returned at the cost of using less space. The learned bloom filter is a variation of the bloom filter, that uses a machine learning model in the pre-processing process to improve the false-positive rate. The learned bloom filter stores some data in the machine learning model, and the leftover data is stored in the auxiliary filter. An auxiliary filter can be implemented by using a bloom filter only, but in this paper, we use the bloom filter and the learned hash function, and this is called an extended learned bloom filter. The learned hash function uses the output value of the machine learning model as a hash function. In this paper, we propose a method that improves the false positive rate of the extended learned bloom filter through grid search. This method explores the extended learned bloom filter with the lowest false positive rate, by increasing the hyperparameter that represents the ratio of the learned hash function. As a result, we experimentally show that the extended learned bloom filter selected through grid search, can have a 20% improvement in false-positive rate compared to the learned bloom filter, in the experiment that needs more than 100,000 data to store. In addition, we also show that the false negative error may occur in the learned hash function by the use of 32-bit floating points in the neural network model. This can be solved by changing the floating points to 64-bit. Finally, we show that in an experiment where we query 10,000 data, we can adjust the structure of the neural network model to save 20KB of space and create an extended learned bloom filter with the same false-positive rate. However, the query time is increased by 2% at the cost of saving 20KB of space.

Improvement of the Classification Model Performance in 119-Emergency Report Data

Eunjung Kwon, Hyuinho Park, Sungwon Byon, Kyuchul Lee

http://doi.org/10.5626/JOK.2022.49.1.89

This paper presents a study of the text classification model to provide optimal response information for each disaster situation with respect to the report content recorded by the receiver in the process of receiving the 119 emergency report. A text classification model that receives a sentence and classifies it into a category is a widely used technique in the field of natural language processing. This study defined the rules for using augmented learning data to improve the performance of the text classification model through supervised learning, and confirmed the performance of the classification model using the augmented learning data through experiments. Through this study, the possibility of extension for improving the performance of the text classification model that is input as the report contents for each emergency situation, such as disease, traffic accident, and injury, was suggested.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr