Search : [ keyword: 객체 검출 ] (4)

Semi-Supervised Object Detection for Small Imbalanced Drama Dataset

Dojin Kim, Unsang Park

http://doi.org/10.5626/JOK.2024.51.11.978

Images of the theme of a drama are typically zoomed-in mainly to people. As a result, people-oriented images are predominant in drama data, and class imbalance naturally occurs. This paper addresses the issue of class imbalance in drama data for object detection tasks and proposes various sampling methods to tackle this challenge within the framework of semi-supervised learning. Experimental evaluations demonstrated that the suggested semi-supervised learning approach with specialized sampling methods outperformed traditional supervised and semi-supervised methods. This study underscores the significance of selecting appropriate training data and sampling methods to optimize object detection performance in specialized datasets with unique characteristics.

Deep Learning-Based Abnormal Event Recognition Method for Detecting Pedestrian Abnormal Events in CCTV Video

Jinha Song, Youngjoon Hwang, Jongho Nang

http://doi.org/10.5626/JOK.2024.51.9.771

With increasing CCTV installations, the workload for monitoring has significantly increased. However, a growing workforce has reached its limits in addressing this issue. To overcome this problem, intelligent CCTV technology has been developed. However, this technology experiences performance degradation in various situations. This paper proposes a robust and versatile method for integrated abnormal behavior recognition in CCTV footage that could be applied in multiple situations. This method could extract frame images from videos to use raw images and heatmap representation images as inputs. It could remove feature vectors through merging methods at both image and feature vector levels. Based on these vectors, we proposed an abnormal behavior recognition method utilizing 2D CNN models, 3D CNN models, LSTM, and Average Pooling. We defined minor classes for performance validation and generated 1,957 abnormal behavior video clips for testing. The proposed method is expected to improve the accuracy of abnormal behavior recognition through CCTV footage, thereby enhancing the efficiency of security and surveillance systems.

Graph Convolution Network Based Feature Map Fusion Method for Multi Scale Object Detection

Jaegi Hwang, Seongju Kang, Kwangsue Chung

http://doi.org/10.5626/JOK.2022.49.8.627

Feature Pyramid Network (FPN) is a feature map fusion technique used to solve the multi-scale problem of object detection. However, since FPN performs feature map fusion by focusing on adjacent resolutions, there is a problem in that semantic information included in non-adjacent layers is diluted. This paper, proposes a graph convolution network (GCN)-based feature map fusion technique for multi-scale object detection. The proposed GCN-based method dynamically fuses feature map information of all layers according to learnable adjacency matrix weights. The adjacency matrix weight is generated based on the multi-scale attention mechanism to adaptively reflect the scale information of the object. The feature map fusion process is performed through a matrix multiplication operation between adjacency matrix and a feature node matrix. The performance of the proposed method was verified by showing that it improves the multi-scale object detection performance in the PASCAL-VOC benchmark dataset compared to the existing FPN method.

Backbone Network for Object Detection with Multiple Dilated Convolutions and Feature Summation

Vani Natalia Kuntjono, Seunghyun Ko, Yang Fang, Geunsik Jo

http://doi.org/10.5626/JOK.2018.45.8.786

The advancement of CNN leads to the trend of using very deep convolutional neural network which contains more than 100 layers not only for object detection, but also for image segmentation and object classification. However, deep CNN requires lots of resources, and so is not suitable for people who have limited resources or real time requirements. In this paper, we propose a new backbone network for object detection with multiple dilated convolutions and feature summation. Feature summation enables easier flow of gradients and minimizes loss of spatial information that is caused by convolving. By using multiple dilated convolution, we can widen the receptive field of individual neurons without adding more parameters. Furthermore, by using a shallow neural network as a backbone network, our network can be trained and used in an environment with limited resources and without pre-training it in ImageNet dataset. Experiments demonstrate we achieved 71% and 38.2% of accuracy on Pascal VOC and MS COCO dataset, respectively.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr