Search : [ keyword: Scene Graph Generation ] (2)

C3DSG: A 3D Scene Graph Generation Model Using Point Clouds of Indoor Environment

Hojun Baek, Incheol Kim

http://doi.org/10.5626/JOK.2023.50.9.758

To design an effective deep neural network model to generate 3D scene graphs from point clouds, the following three challenging issues need to be resolved: 1) to decide how to extract effective geometric features from point clouds, 2) to determine what non-geometric features are used complementarily for recognizing 3D spatial relationships between two objects, and 3) to decide which spatial reasoning mechanism is used. To address these challenging issues, we proposed a novel deep neural network model for generating 3D scene graphs from point clouds of indoor environments. The proposed model uses both geometric features of 3D point cloud extracted using Point Transformer and various non-geometric features such as linguistic features and relative comparison features that can help predict the 3D spatial relationship between objects. In addition, the proposed model uses a new NE-GAT graph neural network module that can apply attention to both object nodes and edges connecting them to effectively derive spatial context between objects. Conducting a variety of experiments using 3DSSG benchmark dataset, effectiveness and superiority of the proposed mode were proven.

Efficient Compositional Translation Embedding for Visual Relationship Detection

Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Kyoung-Woon On, Byoung-Tak Zhang

http://doi.org/10.5626/JOK.2022.49.7.544

Scene graphs are widely used to express high-order visual relationships between objects present in an image. To generate the scene graph automatically, we propose an algorithm that detects visual relationships between objects and predicts the relationship as a predicate. Inspired by the well-known knowledge graph embedding method TransR, we present the CompTransR algorithm that i) defines latent relational subspaces considering the compositional perspective of visual relationships and ii) encodes predicate representations by applying transitive constraints between the object representations in each subspace. Our proposed model not only reduces computational complexity but also outperformed previous state-of-the-art performance in predicate detection tasks in three benchmark datasets: VRD, VG200, and VrR-VG. We also showed that a scene graph could be applied to the image-caption retrieval task, which is one of the high-level visual reasoning tasks, and the scene graph generated by our model increased retrieval performance.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr