Journal of KIISE

Search : [ author: Donghyeop Shin ] (1)

In this paper, as a visual scene understanding problem, we address the problem of generating corresponding scene graphs and image captions from input images. While a scene graph is a formal knowledge representation expressing in-image objects and their relationships, an image caption is a natural language sentence describing the scene captured in the given image. To address the problem effectively, we propose a novel deep neural network model, CSUN(Context-based Scene Understanding Network), to generate two different representations in a complementary way, by exchanging useful contexts with each other. The proposed model consists of three different layers, such as object detection, relationship detection, and caption generation, each of which makes use of proper context to accomplish its own task. To evaluate performance of the proposed model, we conduct various experiments on a large-scale benchmark dataset, Visual Genome. Through these experiments, we demonstrate that our model using useful contexts, achieves significant improvements in accuracy over state-of-the-art models.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Visual Scene Understanding with Contexts

Search

Editorial Office