Search : [ keyword: 합성곱 신경망 ] (19)

CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism

Jung Hyun Lee, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.665

Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.

The Cut Transition Detection Model Using the SSD Method

Sungmin Park, Ui Nyoung Yoon, Geun-Sik Jo

http://doi.org/10.5626/JOK.2020.47.7.655

Shot boundary detection is constantly being studied as an essential technique for analyzing video content. In this paper, we propose an End-to-End Learning model using the SSD (Single Shot Multibox Detector) method to resolve the shortcomings of the existing research and to identify the exact location of the cut transition. We applied the concept of the Multi-Scale Feature Map and Default box of the SSD to predict multiple cut transitions, and combined the concept of Image Concatenation, one of the image comparison methods, with the model to reinforce the feature information of the cut transitions. The proposed model showed 88.7% and 98.0% accuracy in the re-labeled ClipShots and TRECVID 2007 datasets, respectively, compared to the latest research. Additionally, it detected a range closer to the correct answer than the existing deep learning model.

Study and Application of RSSI-based Wi-Fi Channel Detection Using CNN and Frequency Band Characteristics

Junhyun Park, Hyungho Byun, Chong-Kwon Kim

http://doi.org/10.5626/JOK.2020.47.3.335

For mobile devices, Wi-Fi channel scanning is essential to initiating an internet connection, which enables access to a variety of services, and maintaining a stable link quality by periodic monitoring. However, inefficient Wi-Fi operation, where all channels are scanned regardless of whether or not an access point (AP) exists, wastes resources and leads to performance degradation. In this paper, we present a fast and accurate Wi-Fi channel detection method that learns the dynamic frequency band characteristics of signal strengths collected via a low power antenna using a convolution neural network (CNN). Experiments were conducted to demonstrate the channel detection accuracy for different AP combination scenarios. Furthermore, we analyzed the expected performance gain if the suggested method were to assist the scanning operation of the legacy Wi-Fi.

A Visual Analytics Technique for Analyzing the Cause and Influence of Traffic Congestion

Mingyu Pi, Hanbyul Yeon, Hyesook Son, Yun Jang

http://doi.org/10.5626/JOK.2020.47.2.195

In this paper, we present a technique to analyze the causes of traffic congestion based on the traffic flow theory. We extracted vehicle flows from the traffic data, such as GPS trajectory and Vehicle Detector data. Also, vehicle flow changes were identified by utilizing the entropy from the information theory. Then, we extracted cumulative vehicle count curves (N-curve) that can quantify the vehicle flows in the congestion area. According to the traffic flow theory, unique N-curve patterns can be observed depending on the congestion type. We build a convolution neural network classifier that can classify N-curve into four different congestion patterns. Analyzing the cause and influence of congestion is difficult and requires considerable experience and knowledge. Apparently, we present a visual analytics system that can efficiently perform a series of processes to analyze the cause and influence of traffic congestion. Through case studies, we have evaluated our system that can analyze the cause of traffic congestion.

Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure

Jaehyun An, Youngjoong Ko

http://doi.org/10.5626/JOK.2019.46.5.440

In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.

Elastic Multiple Parametric Exponential Linear Units for Convolutional Neural Networks

Daeho Kim, Jaeil Kim

http://doi.org/10.5626/JOK.2019.46.5.469

Activation function plays a major role in determining the depth and non-linearity of neural networks. Since the introduction of Rectified Linear Units for deep neural networks, many variants have been proposed. For example, Exponential Linear Units (ELU) leads to faster learning as pushing the mean of the activations closer to zero, and Elastic Rectified Linear Units (EReLU) changes the slope randomly for better model generalization. In this paper, we propose Elastic Multiple Parametric Exponential Linear Units (EMPELU) as a generalized form of ELU and EReLU. EMPELU changes the slope for the positive part of the function argument randomly within a moderate range during training, and the negative part can be dealt with various types of activation functions by its parameter learning. EMPELU improved the accuracy and generalization performance of convolutional neural networks in the object classification task (CIFAR-10/100), more than well-known activation functions.

Sentence Similarity Prediction based on Siamese CNN-Bidirectional LSTM with Self-attention

Mintae Kim, Yeongtaek Oh, Wooju Kim

http://doi.org/10.5626/JOK.2019.46.3.241

A deep learning model for semantic similarity between sentences was presented. In general, most of the models for measuring similarity word use level or morpheme level embedding. However, the attempt to apply either word use or morpheme level embedding results in higher complexity of the model due to the large size of the dictionary. To solve this problem, a Siamese CNN-Bidirectional LSTM model that utilizes phonemes instead of words or morphemes and combines long short term memory (LSTM) with 1D convolution neural networks with various window lengths that bind phonemes is proposed. For evaluation, we compared our model with Manhattan LSTM (MaLSTM) which shows good performance in measuring similarity between similar questions in the Naver Q&A dataset (similar to Kaggle Quora Question Pair).

Multi-sense Word Embedding to Improve Performance of a CNN-based Relation Extraction Model

Sangha Nam, Kijong Han, Eun-kyung Kim, Sunggoo Kwon, Yoosung Jung, Key-Sun Choi

http://doi.org/10.5626/JOK.2018.45.8.816

The relation extraction task is to classify a relation between two entities in an input sentence and is important in natural language processing and knowledge extraction. Many studies have designed a relation extraction model using a distant supervision method. Recently the deep-learning based relation extraction model became mainstream such as CNN or RNN. However, the existing studies do not solve the homograph problem of word embedding used as an input of the model. Therefore, model learning proceeds with a single embedding value of homogeneous terms having different meanings; that is, the relation extraction model is learned without grasping the meaning of a word accurately. In this paper, we propose a relation extraction model using multi-sense word embedding. In order to learn multi-sense word embedding, we used a word sense disambiguation module based on the CoreNet concept, and the relation extraction model used CNN and PCNN models to learn key words in sentences.

Object Recognition in Low Resolution Images using a Convolutional Neural Network and an Image Enhancement Network

Injae Choi, Jeongin Seo, Hyeyoung Park

http://doi.org/10.5626/JOK.2018.45.8.831

Recently, the development of deep learning technologies such as convolutional neural networks have greatly improved the performance of object recognition in images. However, object recognition still has many challenges due to large variations in images and the diversity of object categories to be recognized. In particular, studies on object recognition in low-resolution images are still in the primary stage and have not shown satisfactory performance. In this paper, we propose an image enhancement neural network to improve object recognition performance of low resolution images. We also use the enhanced images for training an object recognition model based on convolutional neural networks to obtain robust recognition performance with resolution changes. To verify the efficiency of the proposed method, we conducted computational experiments on object recognition in a low-resolution environment using the CIFAR-10 and CIFAR-100 databases. We confirmed that the proposed method can greatly improve the recognition performance in low-resolution images while keeping stable performance in the original resolution images.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr