Digital Library[ Search Result ]
Understanding Video Semantic Structure with Spatiotemporal Graph Random Walk
Hoyeoung Yun, Minseo Kim, Eun-Sol Kim
http://doi.org/10.5626/JOK.2024.51.9.801
Understanding a long video focuses on finding various semantic units present in the video and interpreting complex relationships among them. Conventional approaches utilize models based on CNNs or transformers to encode contextual information for short clips and then consider temporal relationships among them. However, such approaches struggle to capture complex relationships among smaller semantic units within video clips. In this paper, we present video inputs using a spatiotemporal graph with objects as vertices and relative space-time information between objects as edges, to explicitly express relationships among these semantic units. Additionally, we proposed a novel method to represent major semantic units as compositions of smaller units using high-order relationship information obtained by spatiotemporal random walks on the graph. Through experiments on CATER dataset, which involved complex actions of multiple objects, we demonstrated that our approach exhibited effective semantic unit capturing capabilities.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr