Digital Library[ Search Result ]
A Survey of Advantages of Self-Supervised Learning Models in Visual Recognition Tasks
Euihyun Yoon, Hyunjong Lee, Donggeon Kim, Joochan Park, Jinkyu Kim, Jaekoo Lee
http://doi.org/10.5626/JOK.2024.51.7.609
Recently, the field of teacher-based artificial intelligence (AI) has been rapidly advancing. However, teacher-based learning relies on datasets with specified correct answers, which can increase the cost of obtaining these correct answers. To address this issue, self-supervised learning, which can learn general features of photos without needing correct answers, is being researched. In this paper, various self-supervised learning models were classified based on their learning methods and backbone networks. Their strengths, weaknesses, and performances were then compared and analyzed. Photo classification tasks were used for performance comparison. For comparing the performance of transfer learning, detailed prediction tasks were also compared and analyzed. As a result, models that only used positive pairs achieved higher performance by minimizing noise than models that used both positive and negative pairs. Furthermore, for fine-grained predictions, methods such as masking images for learning or utilizing multi-stage models achieved higher performance by additionally learning regional information.
Pseudo-label Correction using Large Vision-Language Models for Enhanced Domain-adaptive Semantic Segmentation
http://doi.org/10.5626/JOK.2024.51.5.464
It is very expensive to make semantic segmentation labels for real-world images. To solve this problem in unsupervised domain adaptation, the model is trained by using data generated in a virtual environment that can easily collect labels or data is already collected and real-world images without labels. One of the common problems in unsupervised domain adaptation is that thing classes with similar appearance are easily confused. In this paper, we propose a method of calibrating the label of the number of target data using large vision-language models. Making the number of labels generated for the target image more accurate can reduce confusion among thing classes. The proposed method improves the performance of DAFormer by +1.1 mIoU in adaptation from game to reality and +1.1 mIoU in adaptation from day to night. For thing classes, the proposed method improved the performance of the MIC by +0.6 mIoU in adaptation from game to reality and +0.7 mIoU in adaptation from day to night.
TwinAMFNet : Twin Attention-based Multi-modal Fusion Network for 3D Semantic Segmentation
Jaegeun Yoon, Jiyeon Jeon, Kwangho Song
http://doi.org/10.5626/JOK.2023.50.9.784
Recently, with the increase in the number of accidents due to misrecognition in autonomous driving, interest in 3D semantic segmentation based on sensor fusion using multi-modal sensors has increased. Accordingly, this study introduces TwinAMFNet, a novel 3D semantic segmentation neural network through sensor fusion of RGB cameras and LiDAR. The proposed neural network includes a twin neural network that processes RGB images and point cloud projection images projected on a 2D coordinate plane and through an attention-based fusion module for feature step fusion in the encoder and decoder. The proposed method shows improvement of further extended object and boundary classification. As a result, the proposed neural network recorded approximately 68% performance in 3D semantic segmentation based on mIoU, and showed approximately 4.5% improved performance compared to the ones reported in the existing studies.
GAN considering ERF for High-resolution Map Generation
http://doi.org/10.5626/JOK.2019.46.2.122
The paper proposes a network structure for a generative adversarial network (GAN) suitable for high resolution image transformation. For analysis of the resolution classification relation necessary for high resolution image conversion, the effective size of the receptive fields of each encoder is calculated and new connection imbalance fields defined. We can reduce the total number of layers by connecting the encoder and decoder to the patch size, we reduce the total number of layers and the appropriate effective receptive fields and parameter usability confirmed through experiments. To solve the problem of simultaneously providing resolution and classification in high resolution image conversion, a network structure capable of converting high resolution satellite images is suggested experimentally. Additionally, the validity of the network structure that simultaneously improves the resolution and classification is confirmed by comparing and analyzing the receptive fields of the proposed network and the existing network’s receptive fields. The proposed network is then quantitatively verified by comparing the proposed network with the existing network by use of objective numerical value through SSIM, an image similarity analysis method.
A Deep Neural Network Architecture for Real-Time Semantic Segmentation on Embedded Board
http://doi.org/10.5626/JOK.2018.45.1.94
We propose Wide Inception ResNet (WIR Net) an optimized neural network architecture as a real-time semantic segmentation method for autonomous driving. The neural network architecture consists of an encoder that extracts features by applying a residual connection and inception module, and a decoder that increases the resolution by using transposed convolution and a low layer feature map. We also improved the performance by applying an ELU activation function and optimized the neural network by reducing the number of layers and increasing the number of filters. The performance evaluations used an NVIDIA Geforce GTX 1080 and TX1 boards to assess the class and category IoU for cityscapes data in the driving environment. The experimental results show that the accuracy of class IoU 53.4, category IoU 81.8 and the execution speed of 640x360, 720x480 resolution image processing 17.8fps and 13.0fps on TX1 board.
Investigating the Feature Collection for Semantic Segmentation via Single Skip Connection
http://doi.org/10.5626/JOK.2017.44.12.1282
Since the study of deep convolutional neural network became prevalent, one of the important discoveries is that a feature map from a convolutional network can be extracted before going into the fully connected layer and can be used as a saliency map for object detection. Furthermore, the model can use features from each different layer for accurate object detection: the features from different layers can have different properties. As the model goes deeper, it has many latent skip connections and feature maps to elaborate object detection. Although there are many intermediate layers that we can use for semantic segmentation through skip connection, still the characteristics of each skip connection and the best skip connection for this task are uncertain. Therefore, in this study, we exhaustively research skip connections of state-of-the-art deep convolutional networks and investigate the characteristics of the features from each intermediate layer. In addition, this study would suggest how to use a recent deep neural network model for semantic segmentation and it would therefore become a cornerstone for later studies with the state-of-the-art network models.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr