Search : [ author: 허유정 ] (2)

Efficient Compositional Translation Embedding for Visual Relationship Detection

Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Kyoung-Woon On, Byoung-Tak Zhang

http://doi.org/10.5626/JOK.2022.49.7.544

Scene graphs are widely used to express high-order visual relationships between objects present in an image. To generate the scene graph automatically, we propose an algorithm that detects visual relationships between objects and predicts the relationship as a predicate. Inspired by the well-known knowledge graph embedding method TransR, we present the CompTransR algorithm that i) defines latent relational subspaces considering the compositional perspective of visual relationships and ii) encodes predicate representations by applying transitive constraints between the object representations in each subspace. Our proposed model not only reduces computational complexity but also outperformed previous state-of-the-art performance in predicate detection tasks in three benchmark datasets: VRD, VG200, and VrR-VG. We also showed that a scene graph could be applied to the image-caption retrieval task, which is one of the high-level visual reasoning tasks, and the scene graph generated by our model increased retrieval performance.

Analyzing and Solving GuessWhat?!

Sang-Woo Lee, Cheolho Han, Yujung Heo, Wooyoung Kang, Jaehyun Jun, Byoung-Tak Zhang

http://doi.org/10.5626/JOK.2018.45.1.30

GuessWhat?! is a game in which two machine players, composed of questioner and answerer, ask and answer yes-no-N/A questions about the object hidden for the answerer in the image, and the questioner chooses the correct object. GuessWhat?! has received much attention in the field of deep learning and artificial intelligence as a testbed for cutting-edge research on the interplay of computer vision and dialogue systems. In this study, we discuss the objective function and characteristics of the GuessWhat?! game. In addition, we propose a simple solver for GuessWhat?! using a simple rule-based algorithm. Although a human needs four or five questions on average to solve this problem, the proposed method outperforms state-of-the-art deep learning methods using only two questions, and exceeds human performance using five questions.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr