Search : [ keyword: 객체 인식 ] (4)

A Survey on Methods for Image Description

Subin Ok, Daeho Lee

http://doi.org/10.5626/JOK.2023.50.3.210

Image description, which has been receiving much attention with the development of deep learning, uses computer vision methods that identify the contents of images and natural language processing methods that represent descriptive sentences. Image description techniques are utilized in many applications including services for visually impaired people. In this paper, we summarize image description methods within three categories; template-based methods, visual/semantic similarity search-based methods, and deep learning-based methods, and compare their performances. Through performance comparison, we try to provide useful information by offering basic architectures, advantages, limitations, and performances of the models. We especially survey the deep learning-based methods in detail because the performances of these methods are significantly improved compared to other methods. Through this process, we aim to organize the overall contents of image description techniques. For the performance of each study, compare the METEOR and BLEU scores for the commonly used Flickr30K and MS COCO datasets, and if the results are not provided, check the test image and the sentences generated for it.

A Combined Model of Outline Feature Map and CNN for Detection of People at the Beach

Gwiseong Moon, Yoon Kim

http://doi.org/10.5626/JOK.2019.46.1.31

As water safety accidents occur every year, many intelligent video surveillance systems are being developed to prevent water safety accidents. In this paper, we propose InsightCNN to accurately detect moving objects in complex images, such as beaches, in intelligent video surveillance systems. First, a basic model was constructed using 1x1 Convolution of Fully Convolutional Network and Residual Block of ResNet. We added an outline feature map that shows a key feature of the image, to the initial layer of the basic model. Results of the experiment demonstrate superiority of the idea of InsightCNN.

Object Recognition in Low Resolution Images using a Convolutional Neural Network and an Image Enhancement Network

Injae Choi, Jeongin Seo, Hyeyoung Park

http://doi.org/10.5626/JOK.2018.45.8.831

Recently, the development of deep learning technologies such as convolutional neural networks have greatly improved the performance of object recognition in images. However, object recognition still has many challenges due to large variations in images and the diversity of object categories to be recognized. In particular, studies on object recognition in low-resolution images are still in the primary stage and have not shown satisfactory performance. In this paper, we propose an image enhancement neural network to improve object recognition performance of low resolution images. We also use the enhanced images for training an object recognition model based on convolutional neural networks to obtain robust recognition performance with resolution changes. To verify the efficiency of the proposed method, we conducted computational experiments on object recognition in a low-resolution environment using the CIFAR-10 and CIFAR-100 databases. We confirmed that the proposed method can greatly improve the recognition performance in low-resolution images while keeping stable performance in the original resolution images.

Active Vision from Image-Text Multimodal System Learning

Jin-Hwa Kim, Byoung-Tak Zhang

http://doi.org/

In image classification, recent CNNs compete with human performance. However, there are limitations in more general recognition. Herein we deal with indoor images that contain too much information to be directly processed and require information reduction before recognition. To reduce the amount of data processing, typically variational inference or variational Bayesian methods are suggested for object detection. However, these methods suffer from the difficulty of marginalizing over the given space. In this study, we propose an image-text integrated recognition system using active vision based on Spatial Transformer Networks. The system attempts to efficiently sample a partial region of a given image for a given language information. Our experimental results demonstrate a significant improvement over traditional approaches. We also discuss the results of qualitative analysis of sampled images, model characteristics, and its limitations.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr