Digital Library[ Search Result ]
Effect Scene Detection using Multimodal Deep Learning Models
Jeongseon Lim, Mikyung Han, Hyunjin Yoon
http://doi.org/10.5626/JOK.2018.45.12.1250
A conventional movie can be converted into a 4D movie by identifying effect scenes. In order to automate this process, in this paper, we propose a multimodal deep learning model that detects effect scenes using both visual and audio features of a movie. We have classified effect/non-effect scenes using audio-based Convolutional Recurrent Neural Network (CRNN) model and video-based Long Short-term Memory (LSTM) and Multilayer Perceptron (MLP) model. Also, we have implemented feature-level fusion. In addition, based on our own observation that effects typically occur during non-dialog scenes, we further detected non-dialog scenes using audio-based Convolutional Neural Network (CNN) model. Subsequently, the prediction scores of audio-visual effect scene classification and audio-based non-dialog classification models were combined. Finally, we detected sequences of effect scenes of the entire movie using prediction score of the input window. Experiments using real-world 4D movies demonstrate that the proposed multimodal deep learning model outperforms unimodal models in terms of effect scene detection accuracy.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr