Journal of KIISE

Search : [ author: 조근식 ] (6)

The facial emotion recognition field of computer vision has recently been identified to demonstrate meaningful results through various neural networks. However, the major datasets of facial emotion recognition have the problem of “class imbalance,” which is a factor that degrades the accuracy of deep learning models. Therefore, numerous studies have been actively conducted to solve the problem of class imbalance. In this paper, we propose “RDGAN,” a facial emotion recognition data augmentation model that uses a GAN to solve the class imbalance of the FER2013 and RAF_single that are used as facial emotion recognition datasets. RDGAN is a network that generates images suitable for classes by adding expression discriminators based on the image-to-image translation model between the existing images as compared to the prevailing studies. The dataset augmented with RDGAN showed an average performance improvement of 4.805%p and 0.857%p in FER2013 and RAF_single, respectively, compared to the dataset without data augmentation.

Anomaly Detection by a Surveillance System through the Combination of C3D and Object-centric Motion Information

Seulgi Park, Myungduk Hong, Geunsik Jo

http://doi.org/10.5626/JOK.2021.48.1.91

In the existing closed-circuit television (CCTV) videos, the deep learning-based anomaly detection reported in the literature detected anomalies using only the object"s action value. For this reason, it is difficult to extract the action value of an object depending upon the situation, and there is a problem that information is reduced over time. Since the cause of abnormalities in CCTV videos involves several factors such as frame complexity and information according to time series analysis, there is a limit to detecting an abnormality using only the action value of the object. To solve this problem, in this paper, we designed a new deep learning-based anomaly detection model that combined optical flow with C3D to use various feature values centered on the objects. The proposed anomaly detection model used the UCF-Crime dataset, and the experimental results achieved an area under the curve (AUC) of 76.44. Compared to previous studies, this study worked more effectively in fast-moving videos such as explosions. Finally, we concluded that it was appropriate to use the information according to different feature values and time series analysis considering various aspects of the behavior of an object when designing an anomaly detection model.

CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism

Jung Hyun Lee, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.665

Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.

The Cut Transition Detection Model Using the SSD Method

Sungmin Park, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.655

Shot boundary detection is constantly being studied as an essential technique for analyzing video content. In this paper, we propose an End-to-End Learning model using the SSD (Single Shot Multibox Detector) method to resolve the shortcomings of the existing research and to identify the exact location of the cut transition. We applied the concept of the Multi-Scale Feature Map and Default box of the SSD to predict multiple cut transitions, and combined the concept of Image Concatenation, one of the image comparison methods, with the model to reinforce the feature information of the cut transitions. The proposed model showed 88.7% and 98.0% accuracy in the re-labeled ClipShots and TRECVID 2007 datasets, respectively, compared to the latest research. Additionally, it detected a range closer to the correct answer than the existing deep learning model.

GS-RANSAC : An Error Filtering Algorithm for Homography Estimation based on Geometric Similarities of Feature Points

Kiheun Song, Myung-Duk Hong, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.3.283

Augmented Reality (AR) is intended to generate information by displaying augmented objects on real-world objects. AR is essentially used to calculate the coordinates of augmented objects, for which a homography estimation method involving two images is generally used. In homography estimation, the RANSAC (Random Sample Consensus) algorithm is used to select the four most appropriate pairs of feature points extracted from the two images. However, conventional RANSAC algorithms cannot guarantee the geometric similarity of the inter-image locations of the feature points selected randomly. In order to resolve this conundrum, we propose an algorithm to evaluate the geometric similarity of inter-image locations of feature points. The proposed algorithm draws tetragons of feature points on each image. Then the algorithm determines if the tetragons are similar in the order of vertices and the range of internal angles. The experimental results show that the proposed algorithm decreases the failure rate by 8.55% and displays the augmented objects more accurately compared with conventional RANSAC. We improved the accuracy of augmented object coordinates in AR using our proposed algorithm.

Efficient CNNs with Channel Attention and Group Convolution for Facial Expression Recognition

MyeongOh Lee, Ui Nyoung Yoon, Seunghyun Ko, Geun-Sik Jo

http://doi.org/10.5626/JOK.2019.46.12.1241

Recently, studies using the convolutional neural network have been actively conducted to recognize emotions from facial expressions. In this paper, we propose an efficient convolutional neural network that solves the model complexity problem of the deep convolutional neural network used to recognize the emotions in facial expression. To reduce the complexity of the model, we used group convolution, depth-wise separable convolution to reduce the number of parameters, and the computational cost. We also enhanced the reuse of features and channel information by using Skip Connection for feature connection and Channel Attention. Our method achieved 70.32% and 85.23% accuracy on FER2013, RAF-single datasets with four times fewer parameters (0.39 Million, 0.41 Million) than the existing model.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Facial Emotion Recognition Data Augmentation using Generative Adversarial Network

Anomaly Detection by a Surveillance System through the Combination of C3D and Object-centric Motion Information

CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism

The Cut Transition Detection Model Using the SSD Method

GS-RANSAC : An Error Filtering Algorithm for Homography Estimation based on Geometric Similarities of Feature Points

Efficient CNNs with Channel Attention and Group Convolution for Facial Expression Recognition

Search

Editorial Office