Malware Classification Possibility based on Sequence Information

Tae-Uk Yun, Chan-Soo Park, Tae-Gyu Hwang, Sung Kwon Kim

http://doi.org/10.5626/JOK.2017.44.11.1125

LSTM(Long Short-term Memory) is a kind of RNN(Recurrent Neural Network) in which a next-state is updated by remembering the previous states. The information of calling a sequence in a malware can be defined as system call function that is called at each time. In this paper, we use calling sequences of system calls in malware codes as input for malware classification to utilize the feature remembering previous states via LSTM. We run an experiment to show that our method can classify malware and measure accuracy by changing the length of system call sequences.

A Cross Layer Optimization Technique for Improving Performance of MLC NAND Flash-Based Storages

Jisung Park, Sungjin Lee, Jihong Kim

http://doi.org/10.5626/JOK.2017.44.11.1130

The multi-leveling technique that stores multiple bits in a single memory cell has significantly improved the density of NAND flash memory along with shrinking processes. However, because of the side effects of the multi-leveling technique, the average write performance of MLC NAND flash memory is degraded more than twice that of SLC NAND flash memory. In this paper, we introduce existing cross-layer optimization techniques proposed to improve the performance of MLC NAND flash-based storages, and propose a new integration technique that overcomes the limitations of existing techniques by exploiting their complementarity. By fully exploiting the performance asymmetry in MLC NAND flash devices at the flash translation layer, the proposed technique can handle many write requests with the performance of SLC NAND flash devices, thus significantly improving the performance of NAND flash-based storages. Experimental results show that the proposed technique improves performance 39% on average over individual techniques.

News Topic Extraction based on Word Similarity

Dongxu Jin, Soowon Lee

http://doi.org/10.5626/JOK.2017.44.11.1138

Topic extraction is a technology that automatically extracts a set of topics from a set of documents, and this has been a major research topic in the area of natural language processing. Representative topic extraction methods include Latent Dirichlet Allocation (LDA) and word clustering-based methods. However, there are problems with these methods, such as repeated topics and mixed topics. The problem of repeated topics is one in which a specific topic is extracted as several topics, while the problem of mixed topic is one in which several topics are mixed in a single extracted topic. To solve these problems, this study proposes a method to extract topics using an LDA that is robust against the problem of repeated topic, going through the steps of separating and merging the topics using the similarity between words to correct the extracted topics. As a result of the experiment, the proposed method showed better performance than the conventional LDA method.

Group Emotion Prediction System based on Modular Bayesian Networks

SeulGi Choi, Sung-Bae Cho

http://doi.org/10.5626/JOK.2017.44.11.1149

Recently, with the development of communication technology, it has become possible to collect various sensor data that indicate the environmental stimuli within a space. In this paper, we propose a group emotion prediction system using a modular Bayesian network that was designed considering the psychological impact of environmental stimuli. A Bayesian network can compensate for the uncertain and incomplete characteristics of the sensor data by the probabilistic consideration of the evidence for reasoning. Also, modularizing the Bayesian network has enabled flexible response and efficient reasoning of environmental stimulus fluctuations within the space. To verify the performance of the system, we predict public emotion based on the brightness, volume, temperature, humidity, color temperature, sound, smell, and group emotion data collected in a kindergarten. Experimental results show that the accuracy of the proposed method is 85% greater than that of other classification methods. Using quantitative and qualitative analyses, we explore the possibilities and limitations of probabilistic methodology for predicting group emotion.

Study on Automatic Bug Triage using Deep Learning

Sun-Ro Lee, Hye-Min Kim, Chan-Gun Lee, Ki-Seong Lee

http://doi.org/10.5626/JOK.2017.44.11.1156

Existing studies on automatic bug triage were mostly used the method of designing the prediction system based on the machine learning algorithm. Therefore, it can be said that applying a high-performance machine learning model is the core of the performance of the automatic bug triage system. In the related research, machine learning models that have high performance are mainly used, such as SVM and Naïve Bayes. In this paper, we apply Deep Learning, which has recently shown good performance in the field of machine learning, to automatic bug triage and evaluate its performance. Experimental results show that the Deep Learning based Bug Triage system achieves 48% accuracy in active developer experiments, un improvement of up to 69% over than conventional machine learning techniques.

Java API Pattern Extraction and Recommendation using Collocation Analysis

Chanwoo Kwon, Sangwon Hwang, Youngkwang Nam

http://doi.org/10.5626/JOK.2017.44.11.1165

Many developers utilize specific APIs to develop software, and to identify the use of a particular API, a developer can refer to a website that provides the API or can retrieve the API from the web. However, the site that provides the API does not necessarily provide guidance on how to use it while it can be partially provided in many other cases. In this paper, we propose a novel system JACE (Java AST collocation-pattern extractor) as a method to reuse commonly-used code as a supplement. The JACE extracts the API call nodes, collocation patterns and analyzes the relations between the collocations to extract significant API patterns from the source code. The following experiment was performed to verify the accuracy of a defined pattern: 794 open source projects were analyzed to extract about 15M API call nodes. Then, the Eclipse plug-in test program was utilized to retrieve the pattern using the top 10 classes of API call nodes. Finally, the code search results from reference pages of the API classes and the Searchcode [1] were compared with the test program results.

Modeling and Composition Method of Collective Behavior of Interactive Systems for Knowledge Engineering

Junsup Song, Maryam Rahmani, Moonkun Lee

http://doi.org/10.5626/JOK.2017.44.11.1178

It is very important to understand system behaviors in collective pattern for each knowledge domain. However, there are structural limitations to represent collective behaviors because of the size of system components and the complexity of their interactions, causing the state explosion problem. Further composition with other systems is mostly impractical because of exponential growth of their size and complexity. This paper presents a practical method to model the collective behaviors, based on a new concept of domain engineering: behavior ontology. Firstly, the ontology defines each collective behavior of a system from active ontology. Secondly, the behaviors are formed in a quantifiably abstract lattice, called common regular expression. Thirdly, a lattice can be composed with other lattices based on quantifiably common elements. The method can be one of the most innovative approaches in representing system behaviors in collective pattern, as well as in minimization of system states to reduce system complexity. For implementation, a prototype tool, called PRISM, has been developed on ADOxx Meta-Modelling Platform.

Planar Curve Smoothing with Individual Weighted Averaging

Sungpil Lyu

http://doi.org/10.5626/JOK.2017.44.11.1194

A traditional average smoothing method is designed for smoothing out noise, which, however, unintentionally results in smooth corner points on the curvature accompanied with a shrinkage of curves. In this paper, we propose a novel curve smoothing method via polygonal approximation of the input curve. The proposed method determines the smoothing weight for each point of the input curve based on the angle and approximation error between the approximated polygon and the input curve. The weight constrains a displacement of the point after smoothing not to significantly exceed the average noise error of the region. In the experiment, we observed that the resulting smoothed curve is close to the original curve since the point moves toward the average position of the noise after smoothing. As an application to digital cartography, for the same amount of smoothing, the proposed method yields a less area reduction even on small curve segments than the existing smoothing methods.

Load Balancing for Distributed Processing of Real-time Spatial Big Data Stream

Susik Yoon, Jae-Gil Lee

http://doi.org/10.5626/JOK.2017.44.11.1209

A variety of sensors is widely used these days, and it has become much easier to acquire spatial big data streams from various sources. Since spatial data streams have inherently skewed and dynamically changing distributions, the system must effectively distribute the load among workers. Previous studies to solve this load imbalance problem are not directly applicable to processing spatial data. In this research, we propose Adaptive Spatial Key Grouping (ASKG). The main idea of ASKG is, by utilizing the previous distribution of the data streams, to adaptively suggest a new grouping scheme that evenly distributes the future load among workers. We evaluate the validity of the proposed algorithm in various environments, by conducting an experiment with real datasets while varying the number of workers, input rate, and processing overhead. Compared to two other alternative algorithms, ASKG improves the system performance in terms of load imbalance, throughput, and latency.

A Study of a Hierarchical Grade-based Contents Forwarding Scheme for CCN Real-time Streaming Service

Taehwan Kim, Taewook Kwon

http://doi.org/10.5626/JOK.2017.44.11.1219

Real-time streaming services over the Internet have increased with the explosive growth of the various mobile platforms, with a focus on smart phones, and the demand for them is growing. In addition, the bandwidth occupied by the streaming services over the Internet had already surpassed 50% in 2010. Because of the shortage of network bandwidth for multimedia services traffic, restrictions on quality and capacity will become more and more serious. CCN is a future Internet architecture that improves how existing host-based Internet architecture handles content-oriented structure, but it is designed for the transmission of general contents and is not suitable for transmitting real-time streaming contents. In this paper, we focus on the inefficient aspects of CCN and propose a hierarchical grade-based scheme for real-time service for a more efficient environment in real-time streaming services. Experiments have shown better performance in terms of bandwidth, network load, and reliability.

Social Network Spam Detection using Recursive Structure Features

Boyeon Jang, Sihyun Jeong, Chongkwon Kim

http://doi.org/10.5626/JOK.2017.44.11.1231

Given the network structure in online social network, it is important to determine a way to distinguish spam accounts from the network features. In online social network, the service provider attempts to detect social spamming to maintain their service quality. However the spammer group changes their strategies to avoid being detected. Even though the spammer attempts to act as legitimate users, certain distinguishable structural features are not easily changed. In this paper, we investigate a way to generate meaningful network structure features, and suggest spammer detection method using recursive structural features. From a result of real-world dataset experiment, we found that the proposed algorithm could improve the classification performance by about 8%.

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI

Han-mook Yoo, Han-joon Kim, Jae-young Chang

http://doi.org/10.5626/JOK.2017.44.11.1236

In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr