Search : [ author: 선종현 ] (1)

Spatio-Temporal Modeling via Adaptive Frequency Filtering for Video Action Recognition

Minji Kim, Taehoon Kim, Jonghyeon Seon, Bohyung Han

http://doi.org/10.5626/JOK.2024.51.12.1078

Modeling long-term spatio-temporal dependencies in video data is challenging, as CNNs often struggle to capture global context through their local receptive fields. To address this problem, we propose an efficient global spatio-temporal modeling method that integrates easily with existing CNN models. Our approach utilizes Discrete Cosine Transform (DCT) to shift information into the frequency domain, where two adaptive filtering paths operate complementarily: one removes redundant frequencies while preserving essential information, and the other enhances important frequencies for spatio-temporal modeling. We introduce DynamicMNIST, a lightweight dataset featuring various digit behaviors like shifting, rotating, and scaling. Our evaluations on three public benchmarks and DynamicMNIST demonstrate that the proposed module enhances activity recognition performance across different CNN models with minimal additional parameters and computational costs.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr