Improving the Quality of Generating Imbalance Data in GANs through an Exhaustive Contrastive Learning Method

Hyeonjun Shin, Sangbaek Lee, Kyuchul Lee

http://doi.org/10.5626/JOK.2023.50.4.295

As the performance of deep learning algorithms has improved, they are being used as a way to solve various problems in the real world. In the case of data that reflect the real world, imbalance data may occur depending on the frequency of occurrence of events or the difficulty of collection. Data with an inconsistent number of classes that make up the data are called imbalance data, and in particular, it is difficult to learn the minority classes with relatively little data through Deep Learning algorithms. Recently, Generative Adversarial Nets (GANs) have been applied as a method for data augmentation, and self-supervised learning-based pre-learning has been proposed for minority class learning. However, because class information of imbalance data is utilized in the process of learning the Generative Model, the quality of generated data is poor due to poor learning of minority classes. To solve this problem, this paper proposes a similarity-based exhaustive contrast learning method. The proposed method is quantitatively evaluated through the Frechet Inception Distance (FID) and Inception Score (IS). The method proposed in this paper confirmed the performance improvement of the Frechet Inception Distance of 16.32 and the Inception Score of 0.38, as compared to the existing method.

PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model

Yoonmin Lee, Taewook Hwang, Sangkeun Jung, Hyein Seo, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.4.306

Recent neural network search has enabled semantic search beyond search based on statistical methods, and finds accurate search results even with typos. This paper proposes a neural network-based patentQ&A search system that provides the closest answer to the user"s question intention when a general public without patent expertise searches for patent information using general terms. A patent dataset was constructed using patent customer consultation data posted on the Korean Intellectual Property Office website. Patent-KoBERT (Triplet) and Patent-KoBERT (CrossEntropy) were fine-tuned as patent datasets were used to extract similar questions to questions entered by the user and re-rank them. As a result of the experiment, values of Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) were 0.96, confirming that answers most similar to the intention of the user input were well selected.

C++ based Deep Learning Open Source Framework WICWIU.v3 that Supports Natural Language and Time-series Data Processing

Junseok Oh, Chanhyo Lee, Okkyun Koo, Injung Kim

http://doi.org/10.5626/JOK.2023.50.4.313

WICWIU is the first open-source deep learning framework developed by Korean university. In this work, we developed WICWIU.v3 that includes features for natural language and time-series data processing. WICWIU was designed for C++ environment, and supports GPU-based parallel processing, and has excellent readability and extensibility, allowing users to easily add new features. In addition to WICWIU.v1 and v2 that focus on image processing models, such as convolutional neural networks (CNN) and general adversarial networks (GAN), WICWIU.v3 provides classes and functions for natural language and time-series data processing, such as recurrent neural networks (RNN), including LSTM (Long Short-Term Memory Networks) and GRU (Gated Recurrent Units), attention modules, and Transformers. We validated the newly added functions for natural language and time-series data by implementing a machine translator and a text generator with WICWIU.v3.

Lexical Substitution Using a Replaced Token Detection Model

Seunghyun Ji, Soowon Lee

http://doi.org/10.5626/JOK.2023.50.4.321

Substitutes in a sentence are words that do not change the meaning of the sentence if substituted. The task of substitution, also known as lexical substitution, can be applied to various natural language processing tasks, such as data augmentation. Traditional methods for lexical substitution may generate unnatural substitutes. To solve this problem, we propose a new method of lexical substitution. Our method samples sentences containing the target word from a corpus, inputs these sentences to the substitutes generator, which is based on the pretrained BERT, and excludes unacceptable candidates with the replaced token detection model. Verifying the proposed method with the open corpus provided by the National Institute of Korean Language and the Natmal synonym dictionary, our method extracts more accurate substitutes than traditional methods. Also, it is found that the replaced token detection model, which is proposed for lexical substitution, performs better in our experiment than the model learned by using the CoLA dataset, which can be considered to exclude unacceptable candidates.

Improving the Performance of Knowledge Tracing Models using Quantized Correctness Embeddings

Yoonjin Im, Jaewan Moon, Eunseong Choi, Jongwuk Lee

http://doi.org/10.5626/JOK.2023.50.4.329

Knowledge tracing is a task of monitoring the proficiency of knowledge based on learners" interaction records. Despite the flexible usage of deep neural network-based models for this task, the existing methods disregard the difficulty of each question and result in poor performance for learners who get the easy question wrong or the hard question correct. In this paper, we propose quantizing the learners’ response information based on the question difficulty so that the knowledge tracing models can learn both the response and the difficulty of the question in order to improve the performance. We design a method that can effectively discriminate between negative samples with a high percentage of correct answer rate and positive samples with a low percentage of correct answer rate. Toward this end, we use sinusoidal positional encoding (SPE) that can maximize the distance difference between embedding representations in the latent space. Experiments show that the AUC value is improved to a maximum of 17.89% in the target section compared to the existing method.

Phoneme Segmentation in Speech Signals Using CTC-based Speech Recognition Model and Low-level Features

Choonghyeon Lee, Sungjae Kim, Injung Kim

http://doi.org/10.5626/JOK.2023.50.4.337

In this paper, we propose a method to segment a speech signal into the intervals of phonemes using multi-level features. Most deeplearning-based speech recognition models estimate the location of phonemes based on high-level features extracted by deep neural networks. However, while high-level features are effective for phoneme classification, low-level features are more effective for phoneme segmentation since they reflect local positional information better. The proposed method first detects phonemes from speech signals using high-level features and then estimates phoneme boundaries using low-level features. In comparison with a baseline model that relies on high-level features, the mean absolute error of phoneme boundary estimation decreased by 95.8% from 0.34 sec to 0.01 sec for the HESD dataset, and decreased by 76.5% from 0.17 sec to 0.04 sec for the NUS-48E dataset. In visualization analysis, the proposed method more accurately estimated phoneme boundaries compared to the baseline model.

A Deep Learning based Speech Quality Enhancement Scheme Using Environmental Sound Classification and Location Information

Byung Hee Kang, Dong Kun Noh

http://doi.org/10.5626/JOK.2023.50.4.344

In the field of speech processing, deep learning has made great advances by improving the precision of speech recognition. One of advances, voice improvement, is a technique that can improve voice recognition by separating voice and noise from input mixed with speaking voice and noise. This is used in AI-speakers and smartphones to facilitate human-to-human communication and enable clean voice data collection for robots and text-to-speech. However, conventional speech enhancement techniques that use only a single model are not effective in eliminating noise that occurs specifically in each environment. To effectively eliminate environmental specific noise, this paper proposes a deep learning model that combines acoustic scene classification techniques with location information utilization techniques to enable optimal environmental-specific speech enhancements. As a result of the experiment, it is confirmed that this technique shows high voice quality improvement with low computational cost in various environments compared to the existing technique.

Rehearsal with Stored Latent Vectors for Incremental Learning Over GANs

Hye-Min Jeong, Dong-Wan Choi

http://doi.org/10.5626/JOK.2023.50.4.351

Unlike humans, sequential learning of multiple tasks is a difficult problem in a deep learning model. This problem is not only for discriminative models, but also for generative models, such as GAN. The Generative Replay method, which is frequently used in GAN continual learning, uses images generated by GAN provided in the previous task together for learning new tasks, but does not generate good images for CIFAR10, which is a relatively challenging task. Therefore, we can consider a rehearsal-based method that stores a portion of the real data, which cannot store a huge amount of images in limited memory because of large dimension of the real image. In this paper, we propose LactoGAN and LactoGAN+, continual learning methods that store latent vectors that are the inputs of GANs rather than storing real images, as the existing rehearsal-based approaches. As a result, more image knowledge can be stored in the same memory; thus, showing better results than the existing GAN continual learning methods.

Performance Improvement of Distributed Parallel Graph Data Processing in InfiniBand Networks

Hyeongjong Kim, Myeong-Seon Gil, Yang-Sae Moon

http://doi.org/10.5626/JOK.2023.50.4.359

Graph data, which values the relationship of each object, is widely used for new rules or association analysis that cannot be found in relational databases. However, there is a limit to high-speed processing due to its complex structure and massive data size. In this paper, we propose PIGraph (Pregel and InfiniBand-based Graph processing engine) to improve the processing performance of graph data. PIGraph is an advanced graph processing engine based on Pregel, which is a representative graph processing model. PIGraph supports the distributed parallel structure using InfiniBand and RDMA (Remote Direct Memory Access) technology to reduce the management complexity of distributed graph processing. In particular, PIGraph improves the processing performance of graph data by optimizing the RDMA communication with segment-based transmissions. Experimental results show that PIGraph improves the processing time by up to 190% compared to Apache Giraph.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr