Digital Library[ Search Result ]
Utilizing External Knowledge in Natural Language Video Localization
Daneul Kim, Daechul Ahn, Jonghyun Choi
http://doi.org/10.5626/JOK.2022.49.12.1097
State-of-the-art Natural Language Video Localization (NLVL) models mostly use existing labels to train. The use of either full-supervision or weak-supervision needs costly annotations, which are not applicable to the real-world NLVL problems. Thus, in this study, we propose the framework of External Knowledge-based Natural Language Video Localization (EK-NLVL), which leverages the idea of generating the pseudo-supervision based on a captioning model that generates sentences from the given frames and summarizes them to ground the video event. Moreover, we propose data augmentation using the pre-trained multi-modal representation learning model CLIP for visual-aligned sentence filtering to generate pseudo-sentences that could effectively provide better quality augmentation. We also propose a new model, Query-Attentive on Segmentations Network (QAS) which effectively uses external knowledge for the NLVL task. Experiments using the Charades-STA dataset demonstrated the efficacy of our method compared to the existing models.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr