Journal of KIISE

Search : [ author: Daechul Ahn ] (1)

State-of-the-art Natural Language Video Localization (NLVL) models mostly use existing labels to train. The use of either full-supervision or weak-supervision needs costly annotations, which are not applicable to the real-world NLVL problems. Thus, in this study, we propose the framework of External Knowledge-based Natural Language Video Localization (EK-NLVL), which leverages the idea of generating the pseudo-supervision based on a captioning model that generates sentences from the given frames and summarizes them to ground the video event. Moreover, we propose data augmentation using the pre-trained multi-modal representation learning model CLIP for visual-aligned sentence filtering to generate pseudo-sentences that could effectively provide better quality augmentation. We also propose a new model, Query-Attentive on Segmentations Network (QAS) which effectively uses external knowledge for the NLVL task. Experiments using the Charades-STA dataset demonstrated the efficacy of our method compared to the existing models.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Utilizing External Knowledge in Natural Language Video Localization

Search

Editorial Office