Journal of KIISE

Search : [ author: 김인중 ] (5)

Phoneme Segmentation in Speech Signals Using CTC-based Speech Recognition Model and Low-level Features

Choonghyeon Lee, Sungjae Kim, Injung Kim

http://doi.org/10.5626/JOK.2023.50.4.337

In this paper, we propose a method to segment a speech signal into the intervals of phonemes using multi-level features. Most deeplearning-based speech recognition models estimate the location of phonemes based on high-level features extracted by deep neural networks. However, while high-level features are effective for phoneme classification, low-level features are more effective for phoneme segmentation since they reflect local positional information better. The proposed method first detects phonemes from speech signals using high-level features and then estimates phoneme boundaries using low-level features. In comparison with a baseline model that relies on high-level features, the mean absolute error of phoneme boundary estimation decreased by 95.8% from 0.34 sec to 0.01 sec for the HESD dataset, and decreased by 76.5% from 0.17 sec to 0.04 sec for the NUS-48E dataset. In visualization analysis, the proposed method more accurately estimated phoneme boundaries compared to the baseline model.

C++ based Deep Learning Open Source Framework WICWIU.v3 that Supports Natural Language and Time-series Data Processing

Junseok Oh, Chanhyo Lee, Okkyun Koo, Injung Kim

http://doi.org/10.5626/JOK.2023.50.4.313

WICWIU is the first open-source deep learning framework developed by Korean university. In this work, we developed WICWIU.v3 that includes features for natural language and time-series data processing. WICWIU was designed for C++ environment, and supports GPU-based parallel processing, and has excellent readability and extensibility, allowing users to easily add new features. In addition to WICWIU.v1 and v2 that focus on image processing models, such as convolutional neural networks (CNN) and general adversarial networks (GAN), WICWIU.v3 provides classes and functions for natural language and time-series data processing, such as recurrent neural networks (RNN), including LSTM (Long Short-Term Memory Networks) and GRU (Gated Recurrent Units), attention modules, and Transformers. We validated the newly added functions for natural language and time-series data by implementing a machine translator and a text generator with WICWIU.v3.

Vehicle Image Data Augmentation by GAN-based Viewpoint Transformation

Hangyel Sun, Myeonghee Lee, Charmgil Hong, Injung Kim

http://doi.org/10.5626/JOK.2021.48.8.885

We introduce a novel GAN-based image synthesis method that transforms vehicle images captured from arbitrary viewpoints into images taken from a specific viewpoint. Training a vehicle image recognizer requires a large number of vehicle images taken from a specific viewpoint. However, in practice, it is difficult to collect such training data, especially for newly released vehicles. Therefore, we propose a method of augmenting vehicle image data by converting a vehicle image from an arbitrary viewpoint into an image from a specific viewpoint. The proposed method first transforms a vehicle image from an arbitrary viewpoint to an image taken from the top-front view using DRGAN, then enhances the image quality with DeblurGAN, and finally, improves the resolution using SRGAN. The experimental results demonstrated that the proposed method successfully converted an image taken within 45 degrees left and right into an image from the top-frontal view and was effective in improving the image quality and resolution.

C++ based General-purpose Open Source Deep Learning Framework, WICWIU

Chunmyong Park, Jeewoong Kim, Yunho Kee, Jihyeon Kim, Seonggyeol Yoon, Eunseo Choi, Injung Kim

http://doi.org/10.5626/JOK.2019.46.3.253

In this paper, we introduce WICWIU, the first open source deep learning framework among Korean universities. WICWIU provides a variety of operators and modules together with a network structure that can represent an arbitrary general computational graph. The WICWIU features are sufficient to compose widely used deep learning models such as Inception, ResNet, and DenseNet. WICWIU also supports GPU-based massive parallel computing which significantly accelerates the training of neural networks. It is also easily accessible for C++ developers because the whole API is provided in C++. WICWIU has an advantage over Python-based frameworks in memory and performance optimization based on the C++ environment. This eases the customizability of WICWIU for environments with limited resources. WICWIU is readable and extensible because it is composed of C++ codes coupled with consistent APIs. With Korean documentation, it is particularly suitable for Korean developers. WICWIU applies the Apache 2.0 license which is available for any research or commercial purposes for free.

T-Commerce Sale Prediction Using Deep Learning and Statistical Model

Injung Kim, Kihyun Na, Sohee Yang, Jaemin Jang, Yunjong Kim, Wonyoung Shin, Deokjung Kim

http://doi.org/10.5626/JOK.2017.44.8.803

T-commerce is technology-fusion service on which the user can purchase using data broadcasting technology based on bi-directional digital TVs. To achieve the best revenue under a limited environment in regard to the channel number and the variety of sales goods, organizing broadcast programs to maximize the expected sales considering the selling power of each product at each time slot. For this, this paper proposes a method to predict the sales of goods when it is assigned to each time slot. The proposed method predicts the sales of product at a time slot given the week-in-year and weather of the target day. Additionally, it combines a statistical predict model applying SVD (Singular Value Decomposition) to mitigate the sparsity problem caused by the bias in sales record. In experiments on the sales data of W-shopping, a T-commerce company, the proposed method showed NMAE (Normalized Mean Absolute Error) of 0.12 between the prediction and the actual sales, which confirms the effectiveness of the proposed method. The proposed method is practically applied to the T-commerce system of W-shopping and used for broadcasting organization.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Digital Library[ Search Result ]

Phoneme Segmentation in Speech Signals Using CTC-based Speech Recognition Model and Low-level Features

C++ based Deep Learning Open Source Framework WICWIU.v3 that Supports Natural Language and Time-series Data Processing

Vehicle Image Data Augmentation by GAN-based Viewpoint Transformation

C++ based General-purpose Open Source Deep Learning Framework, WICWIU

T-Commerce Sale Prediction Using Deep Learning and Statistical Model

Search

Editorial Office