Search : [ keyword: similarity ] (41)

Detecting Software Similarity Using API Sequences on Static Major Paths

Seongsoo Park, Hwansoo Han

http://doi.org/

Software birthmarks are used to detect software plagiarism. For binaries, however, only a few birthmarks have been developed. In this paper, we propose a static approach to generate API sequences along major paths, which are analyzed from control flow graphs of the binaries. Since our API sequences are extracted along the most plausible paths of the binary codes, they can represent actual API sequences produced from binary executions, but in a more concise form. Our similarity measures use the Smith-Waterman algorithm that is one of the popular sequence alignment algorithms for DNA sequence analysis. We evaluate our static path-based API sequence with multiple versions of five applications. Our experiment indicates that our proposed method provides a quite reliable similarity birthmark for binaries.

A Retrieval Augmented Generation(RAG) System Using Query Rewritting Based on Large Langauge Model(LLM)

Minsu Han, Seokyoung Hong, Myoung-Wan Koo

http://doi.org/10.5626/JOK.2025.52.6.474

This paper proposes a retrieval pipeline that can be effectively utilized in fields requiring expert knowledge without requiring fine-tuning. To achieve high accuracy, we introduce a query rewriting retrieval method that leverages large language models to generate examples similar to the given question, achieving higher similarity than existing retrieval models. The proposed method demonstrates excellent performance in both automated evaluations and expert qualitative assessments, while also providing explainability in retrieval results through generated examples. Additionally, we suggest prompts that can be utilized in various domains requiring specialized knowledge during the application of this method. Furthermore, we propose a pipeline method that incorporates a Top-1 retrieval model, which chooses the most relevant document from the three returned by the query rewriting retrieval model. This aims to prevent the hallucination issue caused by the input of unnecessary documents into the large language model.

A Similarity-Based Multi-Knowledge Transfer Algorithm for Enhancing Learning Efficiency of Reinforcement Learning-Based Autonomous Agent

Yeryeong Cho, Soohyun Park, Joongheon Kim

http://doi.org/10.5626/JOK.2025.52.4.310

This paper proposed a similarity-based multi-knowledge transfer algorithm (SMTRL) to enhance the learning efficiency of autonomous agents in reinforcement learning. SMTRL can calculates the similarity between pre-trained models and the current model and dynamically adjust the knowledge transfer ratio based on this similarity to maximize learning efficiency. In complex environments, autonomous agents face significant challenges when learning independently, as this process can be time-consuming and inefficient, making knowledge transfer essential. However, differences between pre-trained models and actual environments can result in negative transfer, leading to diminished learning performance. To tackle this issue, SMTRL dynamically can adjusts the ratio of knowledge transfer from highly similar pre-trained models, thereby accelerating learning stability. Furthermore, experimental results demonstrated that the proposed algorithm outperformed traditional reinforcement learning and traditional knowledge transfer learning in terms of convergence speed. Therefore, this paper introduces a novel approach to efficient knowledge transfer for autonomous agents and discusses its applicability to complex mobility environments and directions for future research.

An Effective Graph Edit Distance Model Using Node Mapping Information

Jun-Gyu Lee, Jongik Kim

http://doi.org/10.5626/JOK.2025.52.1.88

Graph Edit Distance (GED) is the most representative method for quantifying similarity between graphs. However, calculating an exact GED is an NP-Hard problem, which incurs a prohibitively large amount of computational cost. To efficiently compute GED, recent studies have focused on deriving an approximate GED between graphs using deep learning models. However, existing models tend to exhibit large approximation errors and suffer from insufficient interpretability because they do not consider node-to-node relationships between graphs. To remedy these problems faced by existing models, a model that could learn a mapping matrix through node-level embeddings of two graphs was proposed in this study to provide better interpretability of the GED approximation while minimizing information loss during the learning process. Results of experiments showed that the proposed model consistently outperformed existing models.

Device Status-Based Adaptive Frame Extraction and Streaming Control System to Block Obscene Videos in Mobile Devices

Jeongho Kang, Minsu Kim, Kwangsue Chung

http://doi.org/10.5626/JOK.2022.49.7.575

As the user’s access to video streaming services increases, technology for blocking obscene videos in mobile devices is attracting attention. However, the mobile device has a problem as a load is generated in the process of blocking obscene videos due to low processing power. In this paper, we propose a device status-based adaptive frame extraction and streaming control system to block obscene videos in mobile devices. The proposed system extracts frames based on the similarity comparison results between frames and changes in the obscenity of videos. In addition, similarity comparison and frame extraction are controlled according to the device status, and exposure of obscene videos is minimized through mosaic processing. Through the implementation result, it was confirmed that the proposed system improves the response performance to obscenity changes by about 40% through the adaptive frame extraction technology. In addition, it was confirmed that a load is generated in the process of blocking obscene videos by adaptively extracting frames according to the battery condition of the device.

Detecting Design Infringement Using Multi-Modal Visual Data and Auto Encoder based on Convolutional Neural Network

Jeonggeol Kim, Jiyou Seo, Chanjae Lee, Seongmin Jo, Seungmin Kim, Seokmin Yoon, Young Yoon

http://doi.org/10.5626/JOK.2022.49.2.137

Recently, it has become very difficult to distinguish between counterfeit products and authentic goods, and the volume of these forgeries is increasing at an alarming rate. Prompt detection of these counterfeit products is challenging since only humans can identify these forgeries through trained expertise. In this paper, given the photograph and design drawing, we use convolutional neural networks and auto-encoders to detect the possible infringement of design rights without dissembling or damaging the suspected items. We have developed an easy-to-expand system that supports the constant addition of new goods to be examined. We present the result of our system tested with a set of authentic and forged goods.

Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation

Seonghwan Choi, Donghyun Son, Hochang Lee

http://doi.org/10.5626/JOK.2021.48.11.1184

Deriving the similarity between two documents, such as, news articles, is one of the most important factors of clustering documents. Sequence similarity models, one of the existing deep-learning based approaches to document clustering, do not reflect the entire context of documents. To address this issue, this paper uses interaction-based and graph-based approaches to construct document pair similarity models suitable for news clustering. This paper proposes four interaction-based models that measures the similarity between two documents through the aggregation of similarity information in the interaction of sentences. The experimental results demonstrated that two out of these four proposed models outperformed SVM and HAN. Ablation studies were conducted on the graph-based model through experiments on the depth of the model’s neural network and its input features. Through error analysis and ensemble of models with an interaction and graph-based approach, this paper showed that these two approaches could be complementarity due to the differences in their prediction tendencies.

A Streaming Control System for Real-time Blocking of Obscene Videos in Mobile Devices

Jeongho Kang, Minsu Kim, Kwangsue Chung

http://doi.org/10.5626/JOK.2021.48.8.966

As users’ accessibility to video streaming services increases, technology for real-time blocking of obscene videos in mobile devices is drawing attention. However, a load is generated for a mobile device during the blocking process due to a low processing power. In this paper, we propose a streaming control system for real-time blocking of obscene videos in mobile devices. The proposed system can extract the frame of video and analyze the obscenity of the video through an obscenity analysis engine. In addition, the load is minimized by determining the frame extraction method in consideration of obscenity change and similarity comparison results between frames, and obscene video is blocked by performing video mosaic processing. Through the implementation results, it was confirmed that the proposed system could minimize the load generated from a mobile device and user exposure to the obscene part.

An Embedding Technique for Weighted Graphs using LSTM Autoencoders

Minji Seo, Ki Yong Lee

http://doi.org/10.5626/JOK.2021.48.1.13

Graph embedding is the representation of graphs as vectors in a low-dimensional space. Recently, research on graph embedding using deep learning technology have been conducted. However, most research to date has focused mainly on the topology of nodes, and there are few studies on graph embedding for weighted graphs, which has an arbitrary weight on the edges between the nodes. Therefore, in this paper, we proposed a new graph embedding technique for weighted graphs. Given weighted graphs to be embedded, the proposed technique first extracts node-weight sequences that exist inside the graphs, and then encodes each node-weight sequence into a fixed-length vector using an LSTM (Long Short-Term Memory) autoencoder. Finally, for each graph, the proposed technique combines the encoding vectors of node-weight sequences extracted from the graph to generate one final embedding vector. The embedding vectors of the weighted graphs obtained by the proposed technique can be used for measuring the similarity between weighted graphs or classifying weighted graphs. Experiments on synthetic and real datasets consisting of groups of similar weighted graphs showed that the proposed technique provided more than 94% accuracy in finding similar weighted graphs.

LEXAI : Legal Document Similarity Analysis Service using Explainable AI

Juho Bai, Seog Park

http://doi.org/10.5626/JOK.2020.47.11.1061

Recently, in keeping with the improvement of deep learning, studies on using deep learning a specialized field have diversified. Semantic searching for legal documents is an essential part of the legal field. However, it is difficult to function outside of the service using the expert system because it requires professional knowledge in the relevant field. It is also challenging to establish an automated, semantically similar legal document retrieval environment because the cost of hiring professional human resources is high. While existing retrieval services provide an environment based on expert systems and statistical systems, the proposed method adopts the deep learning method with a classification task. We propose a database system structure that provides searching for legal documents with high semantic similarity using an explainable neural network. The features of these proposed methods show the performance of developing and verifying visual similarity assessment methods for semantic relevance among similar documents.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr