Search : [ author: Daechul Ahn ] (1)

Utilizing External Knowledge in Natural Language Video Localization

Daneul Kim, Daechul Ahn, Jonghyun Choi

http://doi.org/10.5626/JOK.2022.49.12.1097

State-of-the-art Natural Language Video Localization (NLVL) models mostly use existing labels to train. The use of either full-supervision or weak-supervision needs costly annotations, which are not applicable to the real-world NLVL problems. Thus, in this study, we propose the framework of External Knowledge-based Natural Language Video Localization (EK-NLVL), which leverages the idea of generating the pseudo-supervision based on a captioning model that generates sentences from the given frames and summarizes them to ground the video event. Moreover, we propose data augmentation using the pre-trained multi-modal representation learning model CLIP for visual-aligned sentence filtering to generate pseudo-sentences that could effectively provide better quality augmentation. We also propose a new model, Query-Attentive on Segmentations Network (QAS) which effectively uses external knowledge for the NLVL task. Experiments using the Charades-STA dataset demonstrated the efficacy of our method compared to the existing models.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr