Search : [ author: 윤의녕 ] (3)

CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism

Jung Hyun Lee, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.665

Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.

The Cut Transition Detection Model Using the SSD Method

Sungmin Park, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.655

Shot boundary detection is constantly being studied as an essential technique for analyzing video content. In this paper, we propose an End-to-End Learning model using the SSD (Single Shot Multibox Detector) method to resolve the shortcomings of the existing research and to identify the exact location of the cut transition. We applied the concept of the Multi-Scale Feature Map and Default box of the SSD to predict multiple cut transitions, and combined the concept of Image Concatenation, one of the image comparison methods, with the model to reinforce the feature information of the cut transitions. The proposed model showed 88.7% and 98.0% accuracy in the re-labeled ClipShots and TRECVID 2007 datasets, respectively, compared to the latest research. Additionally, it detected a range closer to the correct answer than the existing deep learning model.

Efficient CNNs with Channel Attention and Group Convolution for Facial Expression Recognition

MyeongOh Lee, Ui Nyoung Yoon, Seunghyun Ko, Geun-Sik Jo

http://doi.org/10.5626/JOK.2019.46.12.1241

Recently, studies using the convolutional neural network have been actively conducted to recognize emotions from facial expressions. In this paper, we propose an efficient convolutional neural network that solves the model complexity problem of the deep convolutional neural network used to recognize the emotions in facial expression. To reduce the complexity of the model, we used group convolution, depth-wise separable convolution to reduce the number of parameters, and the computational cost. We also enhanced the reuse of features and channel information by using Skip Connection for feature connection and Channel Attention. Our method achieved 70.32% and 85.23% accuracy on FER2013, RAF-single datasets with four times fewer parameters (0.39 Million, 0.41 Million) than the existing model.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr