Search : [ keyword: text-video retrieval ] (1)

Explainable Video Search System using Token Space-based Representation

Jiyeol Park, Dooyoung Kim, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.12.1068

Query-video retrieval is a field that finds the most relevant video to the query input by the user. For this, existing studies have presented the query and video in an appropriate latent vector space. However, the method of calculating the relevance between the query and the video simply uses the dot product of the two vectors without implying the meaning or explainability. In this paper, we propose a model that converts the query and video into embeddings located in a token-based space, searches the video like a document, and calculates semantic similarity. Experimental results show that the performance of the final model proposed in this paper is improved in Recall@1, Recall@5, and Recall@10 compared to baseline on MSVD dataset. Furthermore, the proposed model is approximately 3.33 times faster than CLIP4Clip. When applying BM25 with minimal modifications, it achieves a speedup of about 208.11 times. Additionally, qualitative evaluations demonstrate that tokens extracted from videos exhibit relevance comparable to subtitles, proving an explainability-based structure.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr