Search : [ author: Donghyeop Shin ] (1)

Visual Scene Understanding with Contexts

Donghyeop Shin, Incheol Kim

http://doi.org/10.5626/JOK.2018.45.12.1279

In this paper, as a visual scene understanding problem, we address the problem of generating corresponding scene graphs and image captions from input images. While a scene graph is a formal knowledge representation expressing in-image objects and their relationships, an image caption is a natural language sentence describing the scene captured in the given image. To address the problem effectively, we propose a novel deep neural network model, CSUN(Context-based Scene Understanding Network), to generate two different representations in a complementary way, by exchanging useful contexts with each other. The proposed model consists of three different layers, such as object detection, relationship detection, and caption generation, each of which makes use of proper context to accomplish its own task. To evaluate performance of the proposed model, we conduct various experiments on a large-scale benchmark dataset, Visual Genome. Through these experiments, we demonstrate that our model using useful contexts, achieves significant improvements in accuracy over state-of-the-art models.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr