Software Similarity Detection Using Highly Credible Dynamic API Sequences

Seongsoo Park, Hwansoo Han

http://doi.org/

Software birthmarks, which are unique characteristics of the software, are used to detect software plagiarism or software similarity. Generally, software birthmarks are divided into static birthmarks or dynamic birthmarks, which have evident pros and cons depending on the extraction method. In this paper, we propose a method for extracting the API sequence birthmarks using a dynamic analysis and similarity detection between the executable codes. Dynamic birthmarks based on API sequences extract API functions during the execution of programs. The extracted API sequences often include all the API functions called from the start to the end of the program. Meanwhile, our dynamic birthmark scheme extracts the API functions only called directly from the executable code. Then, it uses a sequence alignment algorithm to calculate the similarity metric effectively. We evaluate the birthmark with several open source software programs to verify its reliability and credibility. Our dynamic birthmark scheme based on the extracted API sequence can be utilized in a similarity test of executable codes.

δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets

Youngho Kim, Jeong Seop Sim

http://doi.org/

(δ, γ)-matching for strings over integer alphabets can be applied to such fields as musical melody and share prices on stock markets. In this paper, we define δ-approximate periods and γ-approximate periods of strings over integer alphabets. We also present two O(n²) - time algorithms, each of which finds minimum δ-approximate periods and minimum γ-approximate periods, respectively. Then, we provide the experimental results of execution times of both algorithms.

Application of Single-State Parsing Automata to LR Grammars

Gyung-Ok Lee

http://doi.org/

Single-state parsing automata have a characteristic such that the decision of an action depends only on the current state but not on the parsing history. The memory space and the parsing time of single-state parsing automata are less than the memory space and the parsing time of LR automata. However, the applicable grammar class of single-state parsing automata is less than that of LR automata. This paper provides extended single-state parsing automata, which are applicable to LR grammars. In the prior work, the special state, referred to as the cyclic state was not treated in the construction of single-state parsing automata, and hence, the applicable grammar class was less than LR grammars. The paper solves the problem of cyclic states by processing dynamic information depending on an input string. The proposed method expands the application of grammar class of single-state parsing automata to LR grammars.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods

Sunjae Kwon, Juae Kim, Sangwoo Kang, Jungyun Seo

http://doi.org/

Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

A Secure BLE Integration Authentication System for a BLE Device Control Server based on Physical Web and Eddystone

ChoonSung Nam, Hyunhee Jung, Dongryeol Shin

http://doi.org/

Physical Web and Eddystone can be serviced by a single integrated application on the device by using their servers’ URL. However, they have a limitation that their servers must be customized for service characteristics on a case by case basis. In other words, regardless of the service selected for BLE, it should have a modified linkage application for each device. Hence, we think that a new integrated service platform, which is able to link and support its Beacon from the central server and is also able to support its application, is needed for achieving better service quality. This platform consists of push (Broadcasting for Beacon service) parts and pull (Connection) parts to establish communication. Especially, Pull should be operated and controlled under the authorization (secure) management for safe and trustable communication. It means that BLE must have its new authorization communications protocol to protect its data as much as possible. In this paper, we propose a BLE integrated authorization protocol for a BLE device control server based on Physical Web and Eddystone.

Dynamic Impact Analysis Method using Use-case and UML Models on Object-oriented Analysis

Chan Lee, Cheong Youn

http://doi.org/

Software is continuously changing during development and after development. When a change is required, it is difficult to precisely grasp the scope of impact intuitively. A systematic method is needed to accomplish the required change. The purpose of impact analysis on software change is to avoid missing any information by recognizing the ripple effect that the change might cause. This paper proposes a dynamic method that can easily identify the scope of change request by using the association between use-case scenarios and artifacts of UML modeling in object-oriented development environment. By using this approach, the scope of impact that the change might have on other components such as class diagram and sequence diagram in use-case scenarios can be identified by forward tracing. In addition, analysis of influence of possible further changes due to changes in other components can be identified iteratively through backward tracing. The results of this paper are not limited to impact analysis on artifacts and change type. They can also be used as basic guidelines during impact analysis for various change requests.

Distributed Assumption-Based Truth Maintenance System for Scalable Reasoning

Batselem Jagvaral, Young-Tack Park

http://doi.org/

Assumption-based truth maintenance system (ATMS) is a tool that maintains the reasoning process of inference engine. It also supports non-monotonic reasoning based on dependency-directed backtracking. Bookkeeping all the reasoning processes allows it to quickly check and retract beliefs and efficiently provide solutions for problems with large search space. However, the amount of data has been exponentially grown recently, making it impossible to use a single machine for solving large-scale problems. The maintaining process for solving such problems can lead to high computation cost due to large memory overhead. To overcome this drawback, this paper presents an approach towards incrementally maintaining the reasoning process of inference engine on cluster using Spark. It maintains data dependencies such as assumption, label, environment and justification on a cluster of machines in parallel and efficiently updates changes in a large amount of inferred datasets. We deployed the proposed ATMS on a cluster with 5 machines, conducted OWL/RDFS reasoning over University benchmark data (LUBM) and evaluated our system in terms of its performance and functionalities such as assertion, explanation and retraction. In our experiments, the proposed system performed the operations in a reasonably short period of time for over 80GB inferred LUBM2000 dataset.

Evaluation of Structural Changes of a Controlled Group Using Time-Sequential SNA

Woong Lee, Seong-Woong Yoon, Sang-Hoon Lee

http://doi.org/

A controlled group is closed compared to other organizations, which hinders collection of data and accurate analysis, so that it is hard to evaluate a controlled group’s power structure and predict future changes using usual analytical methods including sociological approach. Analyzing a controlled group using SNA can allow for evaluation of inner power structure by revealing the relationships between members and identifying members with central roles given limited data. In this study, in order to evaluate changes in power structure, time-sequential SNA research was conducted by analyzing eigenvector centrality, which reflects individual influence and reveals the overall power structure. The result showed an improvement in accuracy compared to other centralities that contain individual degree or closeness, and made it possible to presume structural changes such as promotion or purge of a member.

A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs

Namyong Park, Chiwan Park, U Kang

http://doi.org/

How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

A Group Modeling Strategy Considering Deviation of the User’s Preference in Group Recommendation

HyungJin Kim, Young-Duk Seo, Doo-Kwon Baik

http://doi.org/

Group recommendation analyzes the characteristics and tendency of a group rather than an individual and provides relevant information for the members of the group. Existing group recommendation methods merely consider the average and frequency of a preference. However, if the users’ preferences have large deviations, it is difficult to provide satisfactory results for all users in the group, although the average and frequency values are high. To solve these problems, we propose a method that considers not only the average of a preference but also the deviation. The proposed method provides recommendations with high average values and low deviations for the preference, so it reflects the tendency of all group members better than existing group recommendation methods. Through a comparative experiment, we prove that the proposed method has better performance than existing methods, and verify that it has high performance in groups with a large number of members as well as in small groups.

Risk Analysis on Various Contextual Situations and Progressive Authentication Method based on Contextual-Situation-based Risk Degree on Android Devices

Jihwan Kim, SeungHyun Kim, Soo-Hyung Kim, Younho Lee

http://doi.org/

To prevent the use of one’s smartphone by another user, the authentication checks the owner in several ways. However, whenever the owner does use his/her smartphone, this authentication requires an unnecessary action, and sometimes he/she finally decides not to use an authentication method. This can cause a fatal problem in the smartphone’s security. We propose a sustainable android platform-based authentication mode to solve this security issue and to facilitate secure authentication. In the proposed model, a smartphone identifies the current situation and then performs the authentication. In order to define the risk of the situation, we conducted a survey and analyzed the survey results by age, location, behavior, etc. Finally, a demonstration program was implemented to show the relationship between risk and security authentication methods.

A Secure and Practical Encrypted Data De-duplication with Proof of Ownership in Cloud Storage

Cheolhee Park, Dowon Hong, Changho Seo

http://doi.org/

In cloud storage environment, deduplication enables efficient use of the storage. Also, in order to save network bandwidth, cloud storage service provider has introduced client-side deduplication. Cloud storage service users want to upload encrypted data to ensure confidentiality. However, common encryption method cannot be combined with deduplication, because each user uses a different private key. Also, client-side deduplication can be vulnerable to security threats because file tag replaces the entire file. Recently, proof of ownership schemes have suggested to remedy the vulnerabilities of client-side deduplication. Nevertheless, client-side deduplication over encrypted data still causes problems in efficiency and security. In this paper, we propose a secure and practical client-side encrypted data deduplication scheme that has resilience to brute force attack and performs proof of ownership over encrypted data.

An Automated Technique for Illegal Site Detection using the Sequence of HTML Tags

Kiryong Lee, Heejo Lee

http://doi.org/

Since the introduction of BitTorrent protocol in 2001, everything can be downloaded through file sharing, including music, movies and software. As a result, the copyright holder suffers from illegal sharing of copyright content. In order to solve this problem, countries have enacted illegal share related law; and internet service providers block pirate sites. However, illegal sites such as pirate bay easily reopen the site by changing the domain name. Thus, we propose a technique to easily detect pirate sites that are reopened. This automated technique collects the domain names using the google search engine, and measures similarity using Longest Common Subsequence (LCS) algorithm by comparing the tag structure of the source web page and reopened web page. For evaluation, we colledted 2,383 domains from google search. Experimental results indicated detection of a total of 44 pirate sites for collected domains when applying LCS algorithm. In addition, this technique detected 23 pirate sites for 805 domains when applied to foreign pirate sites. This experiment facilitated easy detection of the reopened pirate sites using an automated detection system.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr