Digital Library[ Search Result ]
CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism
Jung Hyun Lee, Ui Nyoung Yoon, Geun-Sik Jo
http://doi.org/10.5626/JOK.2020.47.7.665
Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.
The Cut Transition Detection Model Using the SSD Method
Sungmin Park, Ui Nyoung Yoon, Geun-Sik Jo
http://doi.org/10.5626/JOK.2020.47.7.655
Shot boundary detection is constantly being studied as an essential technique for analyzing video content. In this paper, we propose an End-to-End Learning model using the SSD (Single Shot Multibox Detector) method to resolve the shortcomings of the existing research and to identify the exact location of the cut transition. We applied the concept of the Multi-Scale Feature Map and Default box of the SSD to predict multiple cut transitions, and combined the concept of Image Concatenation, one of the image comparison methods, with the model to reinforce the feature information of the cut transitions. The proposed model showed 88.7% and 98.0% accuracy in the re-labeled ClipShots and TRECVID 2007 datasets, respectively, compared to the latest research. Additionally, it detected a range closer to the correct answer than the existing deep learning model.
GS-RANSAC : An Error Filtering Algorithm for Homography Estimation based on Geometric Similarities of Feature Points
Kiheun Song, Myung-Duk Hong, Geun-Sik Jo
http://doi.org/10.5626/JOK.2020.47.3.283
Augmented Reality (AR) is intended to generate information by displaying augmented objects on real-world objects. AR is essentially used to calculate the coordinates of augmented objects, for which a homography estimation method involving two images is generally used. In homography estimation, the RANSAC (Random Sample Consensus) algorithm is used to select the four most appropriate pairs of feature points extracted from the two images. However, conventional RANSAC algorithms cannot guarantee the geometric similarity of the inter-image locations of the feature points selected randomly. In order to resolve this conundrum, we propose an algorithm to evaluate the geometric similarity of inter-image locations of feature points. The proposed algorithm draws tetragons of feature points on each image. Then the algorithm determines if the tetragons are similar in the order of vertices and the range of internal angles. The experimental results show that the proposed algorithm decreases the failure rate by 8.55% and displays the augmented objects more accurately compared with conventional RANSAC. We improved the accuracy of augmented object coordinates in AR using our proposed algorithm.
Efficient CNNs with Channel Attention and Group Convolution for Facial Expression Recognition
MyeongOh Lee, Ui Nyoung Yoon, Seunghyun Ko, Geun-Sik Jo
http://doi.org/10.5626/JOK.2019.46.12.1241
Recently, studies using the convolutional neural network have been actively conducted to recognize emotions from facial expressions. In this paper, we propose an efficient convolutional neural network that solves the model complexity problem of the deep convolutional neural network used to recognize the emotions in facial expression. To reduce the complexity of the model, we used group convolution, depth-wise separable convolution to reduce the number of parameters, and the computational cost. We also enhanced the reuse of features and channel information by using Skip Connection for feature connection and Channel Attention. Our method achieved 70.32% and 85.23% accuracy on FER2013, RAF-single datasets with four times fewer parameters (0.39 Million, 0.41 Million) than the existing model.
Automatic Transformation of Korean Fonts using Unbalanced U-net and Generative Adversarial Networks
Pangjia, Seunghyun Ko, Yang Fang, Geun-sik Jo
http://doi.org/10.5626/JOK.2019.46.1.15
In this paper, we study the typography transfer problem: transferring a source font, to an analog font with a specified style. To solve the typography transfer problem, we treat the problem as an image-to-image translation problem, and propose an unbalanced u-net architecture based on Generative Adversarial Network(GAN). Unlike traditional balanced u-net architecture, architecture we proposed consists of two subnets: (1) an unbalanced u-net is responsible for transferring specified fonts style to another, while maintaining semantic and structure information; (2) an adversarial net. Our model uses a compound loss function that includes a L1 loss, a constant loss, and a binary GAN loss to facilitate generating desired target fonts. Experiments demonstrate that our proposed network leads to more stable training loss, with faster convergence speed in cheat loss, and avoids falling into a degradation problem in generating loss than balanced u-net.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr