Search : [ author: 임정선 ] (1)

Effect Scene Detection using Multimodal Deep Learning Models

Jeongseon Lim, Mikyung Han, Hyunjin Yoon

http://doi.org/10.5626/JOK.2018.45.12.1250

A conventional movie can be converted into a 4D movie by identifying effect scenes. In order to automate this process, in this paper, we propose a multimodal deep learning model that detects effect scenes using both visual and audio features of a movie. We have classified effect/non-effect scenes using audio-based Convolutional Recurrent Neural Network (CRNN) model and video-based Long Short-term Memory (LSTM) and Multilayer Perceptron (MLP) model. Also, we have implemented feature-level fusion. In addition, based on our own observation that effects typically occur during non-dialog scenes, we further detected non-dialog scenes using audio-based Convolutional Neural Network (CNN) model. Subsequently, the prediction scores of audio-visual effect scene classification and audio-based non-dialog classification models were combined. Finally, we detected sequences of effect scenes of the entire movie using prediction score of the input window. Experiments using real-world 4D movies demonstrate that the proposed multimodal deep learning model outperforms unimodal models in terms of effect scene detection accuracy.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr