Search : [ keyword: 오토인코더 ] (14)

Phishing Webpage Detection using URL and HTML Graphs based on a Multimodal AutoEncoder Ensemble

Jun-Ho Yoon, Seok-Hun Choi, Hae-Jung Kim, Seok-Jun Buu

http://doi.org/10.5626/JOK.2025.52.6.461

As the internet continues to evolve, phishing attacks are increasingly targeting users, highlighting the need for effective detection methods. Traditional approaches focus on analyzing URL character sequences; however, phishing URLs often mimic legitimate patterns and have a short lifespan, limiting detection accuracy. To address this, we propose a multimodal ensemble-based phishing detection method that leverages both URL strings and HTML graph data. Character-level URL sequences are processed using a Convolutional AutoEncoder (CAE), while HTML DOM structures are converted into graph formats and analyzed with a Graph Convolutional AutoEncoder (GCAE). The extracted latent vectors are integrated via a Transformer layer to classify phishing webpages. The proposed model improves detection performance by up to 18.91 percentage points in F1 Score compared to existing methods, and case analysis reveals the interrelationship between URL and HTML features.

A VQG Framework for Accurate and Diverse Question Generation

Hee-Yeon Choi, Dong-Wan Choi

http://doi.org/10.5626/JOK.2025.52.1.62

Visual Question Generation (VQG) aims to generate questions based on a given image, often utilizing additional information such as answers or answer types if necessary. A VQG system should be able to generate diverse questions for a single image, while maintaining relevance to the image alongside its additional information. However, models that highly focus on relevance to the image might overfit to the dataset, leading to limited diversity, while those that emphasize diversity might generate questions less related to the input. Therefore, balancing these two aspects is crucial in VQG. To address this challenge, we proposed BCVQG (BLIP-CVAE VQG), a system that could integrate a pre-trained vision-language model with a Conditional Variational AutoEncoder (CVAE). The effectiveness of the proposed method was validated through quantitative and qualitative evaluations on the VQA2.0 dataset.

A Deep Learning Model for Fire Anomaly Detection in Underground Utility Tunnel based on ConvLSTM Variational AutoEncoder

Joseph Ahn, Hyo-gun Yoon

http://doi.org/10.5626/JOK.2024.51.4.333

As the failure of fire detection not only leads to an escalation in disaster management costs but also inflicts significant damages and disruptions to citizens" lives and industries, accurate detection of fire anomalies is of paramount importance. There have been several studies on monitoring and managing catastrophic events using AI, IoT and digital twin technologies. However, the challenges arise from the telecommunications environment and the level of sensor maintenance, making it difficult for IoT sensors to collect data without experiencing loss or noise. This paper proposes a hybrid deep learning model called ConvLSTM-VAE that can detect anomalies by considering spatial and temporal information simultaneously, demonstrating robust results even in the presence of noise or data loss. A virtual environment modeled after the underground utility tunnel located in Ochang, Chungcheongbuk-do is constructed to collect fire data using Fire Dynamics Simulator (FDS) software. In the experiment we compared the proposed model to other time-series anomaly detection models and evalutated its predictive performance. The results show that the precision, recall, accuracy, and F1-score of ConvLSTM-VAE are 0.881579, 0.99505, 0.930693, and 0.934884, respectively, and far superior to other models in terms of its predictive performance.

Hierarchical Latent Representation-based Framework for Automatic Detection of Cybercrime Slang

Yong-Yeon Kim, Byung-Won On

http://doi.org/10.5626/JOK.2023.50.12.1121

Cybercriminals constantly produce and use slang by adding criminal meanings to existing words or replacing them with similar words for communication. Continuous monitoring and manual work are required to respond to this, and a large amount of labeled training data is required when using deep learning. However, the ability to collect a large amount of training data is limited because direct labeling by a person requires a lot of time and money and proceeds secretly due to the nature of cybercrime. Thus, we develop a framework based on an autoencoder and propose a method to effectively detect contextual cybercrime slang and neologisms through hierarchical latent vector similarity comparisons to address these limitations. Experiments using a cybercrime post dataset showed that the framework had an accuracy of up to 99.1% at a similarity threshold of 0.5.

Detecting Design Infringement Using Multi-Modal Visual Data and Auto Encoder based on Convolutional Neural Network

Jeonggeol Kim, Jiyou Seo, Chanjae Lee, Seongmin Jo, Seungmin Kim, Seokmin Yoon, Young Yoon

http://doi.org/10.5626/JOK.2022.49.2.137

Recently, it has become very difficult to distinguish between counterfeit products and authentic goods, and the volume of these forgeries is increasing at an alarming rate. Prompt detection of these counterfeit products is challenging since only humans can identify these forgeries through trained expertise. In this paper, given the photograph and design drawing, we use convolutional neural networks and auto-encoders to detect the possible infringement of design rights without dissembling or damaging the suspected items. We have developed an easy-to-expand system that supports the constant addition of new goods to be examined. We present the result of our system tested with a set of authentic and forged goods.

Deep Neural Networks and End-to-End Learning for Audio Compression

Daniela N. Rim, Inseon Jang, Heeyoul Choi

http://doi.org/10.5626/JOK.2021.48.8.940

Recent advances in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data using unified deep network models. The fabrication and design of such models for compressing audio signals has been a challenge due to the need for discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach enables the separation of the encoder and decoder, which is necessary for compression tasks. To the best of our knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.53dB.

Effect of Denoising Autoencoder in the view of Item Popularity Bias

Jinhong Kim, Jae-woong Lee, Jongwuk Lee

http://doi.org/10.5626/JOK.2021.48.5.575

Denoising autoencoder (DAE) is commonly used in recent recommendation systems. It is a type of Autoencoder that trains by giving noise to the input and has shown improved performance compared to autoencoder. In this paper, we analyze the effect of noise in terms of item popularity to interpret the training of DAE. For analysis, we design the experiment in the following two ways. First, we observe the changes of the learned item vector’s L2-norm by giving noise to the autoencoder. Second, by giving noise only to presampled items by popularity, we anlayze whether the improved performance of the DAE is related to item popularity. Results of the experiment showed that the variance of the item vector norm caused by popularity was reduced by noise, and that the accuracy increased when noise was given to the popular items.

An Effective Detection Method of Anomalous Sequences Considering the Occurrence Order and Time Interval of the Elements

Jooyeon Lee, Ki Yong Lee

http://doi.org/10.5626/JOK.2021.48.4.469

Recently, a rapid generation of sequence data consisting of elements in various applications has been witnessed over time. Although various methods for detecting anomalous sequences among the given sequences have been actively studied, most of them mainly consider only the occurrence order of the elements. In this paper, we propose an effective anomalous sequence detection method considering not only the occurrence order of the elements but also the time interval between the elements. Apparently, the proposed method uses a model that combines two autoencoders. The first is an LSTM autoencoder, which learns the features of the occurrence order of elements, and the second is a graph autoencoder, which learns the features of the time interval between the elements. After completion of the training, each sequence is input to the trained model and reconstructed by the trained model. If the occurrence order and time interval of elements in the reconstructed sequence greatly differ from those in the original sequence, the corresponding sequence is determined as an anomalous sequence. Through various experiments using synthetic data, we confirmed that the proposed method can detect anomalous sequences more effectively than the method that uses an RNN autoencoder to learn the occurrence order of the elements, the methods that use a single LSTM autoencoder and the method that doesn’t use deep learning model.

Autoencoder-based Learning Contribution Measurement Method for Training Data Selection

Yuna Jeong, Myunggwon Hwang, Wonkyung Sung

http://doi.org/10.5626/JOK.2021.48.2.195

Despite recent significant performance improvements, the iterative process of machine-learning algorithms makes development and utilization difficult and time-consuming. In this paper, we present a data-selection method that reduces the time required by providing an approximate solution . First, data are mapped to a feature vector in latent space based on an Autoencoder, with high weight given to data with high learning contribution that are relatively difficult to learn. Finally, data are ranked and selected based on weight and used for training. Experimental results showed that the proposed method selected data that achieve higher performance than random sampling.

An Embedding Technique for Weighted Graphs using LSTM Autoencoders

Minji Seo, Ki Yong Lee

http://doi.org/10.5626/JOK.2021.48.1.13

Graph embedding is the representation of graphs as vectors in a low-dimensional space. Recently, research on graph embedding using deep learning technology have been conducted. However, most research to date has focused mainly on the topology of nodes, and there are few studies on graph embedding for weighted graphs, which has an arbitrary weight on the edges between the nodes. Therefore, in this paper, we proposed a new graph embedding technique for weighted graphs. Given weighted graphs to be embedded, the proposed technique first extracts node-weight sequences that exist inside the graphs, and then encodes each node-weight sequence into a fixed-length vector using an LSTM (Long Short-Term Memory) autoencoder. Finally, for each graph, the proposed technique combines the encoding vectors of node-weight sequences extracted from the graph to generate one final embedding vector. The embedding vectors of the weighted graphs obtained by the proposed technique can be used for measuring the similarity between weighted graphs or classifying weighted graphs. Experiments on synthetic and real datasets consisting of groups of similar weighted graphs showed that the proposed technique provided more than 94% accuracy in finding similar weighted graphs.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr