Digital Library[ Search Result ]
Adversarial Training with Contrastive Learning in NLP
Daniela N. Rim, DongNyeong Heo, Heeyoul Choi
http://doi.org/10.5626/JOK.2025.52.1.52
Adversarial training has been extensively studied in natural language processing (NLP) settings to make models robust so that similar inputs derive similar outcomes semantically. However, since language has no objective measure of semantic similarity, previous works use an external pre-trained NLP model to ensure this similarity, introducing an extra training stage with huge memory consumption. This work proposes adversarial training with contrastive learning (ATCL) to train a language processing model adversarially using the benefits of contrastive learning. The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning. We apply ATCL to language modeling and neural machine translation tasks showing an improvement in the quantitative (perplexity and BLEU) scores. Furthermore, ATCL achieves good qualitative results in the semantic level for both tasks without using a pre-trained model through simulation.
New Transformer Model to Generate Molecules for Drug Discovery
Yu-Bin Hong, Kyungjun Lee, DongNyenog Heo, Heeyoul Choi
http://doi.org/10.5626/JOK.2023.50.11.976
Among various generative models, recurrent neural networks (RNNs) based models have achieved state-of-the-art performance in the drug generation task. To overcome the long-term dependency problem that RNNs suffer from, Transformer-based models were proposed for the task. However, the Transformer models showed worse performances than the RNNs models in the drug generation task, and we believe it was because the Transformer models were over-parameterized with the over-fitting problem. To avoid the problem, in this paper, we propose a new Transformer model by replacing the large decoder with simple feed-forward layers. Experiments confirmed that our proposed model outperformed the previous state-of-the-art baseline in major evaluation metrics while preserving other minor metrics with a similar level of performance. Furthermore, when we applied our model to generate candidate molecules against SARs-CoV-2 (COVID-19) virus, the generated molecules were more effective than drugs in commercial market such as Paxlovid, Molnupiravir, and Remdesivir.
Mini-Batching with Similar-Length Sentences to Quickly Train NMT Models
Daniela N. Rim, Richard Kimera, Heeyoul Choi
http://doi.org/10.5626/JOK.2023.50.7.614
The Transformer model has revolutionized Natural Language Processing tasks such as Neural Machine Translation. Many efforts have been made to study the Transformer architecture to increase its efficiency and accuracy. One potential area for improvement is to address the computation of empty tokens that the Transformer computes only to discard them later, leading to an unnecessary computational burden. To tackle this, we propose an algorithm that sorts translation sentence pairs based on their length before batching and mini-batch with similar-length sentences, which minimizes the waste of computing power. Since the amount of sorting could violate the independent and identically distributed (i.i.d) data assumption, we sort the data partially. In experiments, we apply the proposed method to English-Korean and English-Luganda language pairs for machine translation and show that there are gains in computational time while maintaining the performance. Our method is independent of architectures, so that it can be easily integrated into any training process with flexible data lengths.
Korean-English Neural Machine Translation Using Korean Alphabet Characteristics and Honorific Expressions
Jeonghui Kim, Jaemu Heo, Joowhan Kim, Heeyoul Choi
http://doi.org/10.5626/JOK.2022.49.11.1017
Recently, deep learning has improved the performance of machine translation, but in most cases, it does not reflect the characteristics of the languages. In particular, Korean has unique linguistic word and expression features, which might cause mistranslation. For example, in Google Translate from Korean to English, mistranslations occur when a noun in Korean ends with the postposition (josa) in the form of a single consonant. Also, in the English-Korean translations, the honorifics and casual expressions are mixed in the translated results. This is because the alphabetic characteristics and honorifics of the Korean language are not reflected. In this paper, to address these problems, we propose to train a model with sub-words composed of units of letters (jamo) and unifying honorific and casual expressions in the corpus. The experimental results confirmed that the proposed method resolved the problems mentioned above, and had a similar or slightly higher BLEU score compared to the existing method and the corpus.
Building a Parallel Corpus and Training Translation Models Between Luganda and English
Richard Kimera, Daniela N. Rim, Heeyoul Choi
http://doi.org/10.5626/JOK.2022.49.11.1009
Recently, neural machine translation (NMT) which has achieved great successes needs large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even ‘Google translate’ does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.
Improvement of Deep Learning Models to Predict the Knowledge Level of Learners based on the EdNet Data
Seulgi Choi, Youngpyo Kim, Sojung Hwang, Heeyoul Choi
http://doi.org/10.5626/JOK.2021.48.12.1335
As online education increases, the field of AI in Education (AIEd), where artificial intelligence is used for education, is being actively studied. Knowledge Tracing (KT), which predicts a student"s knowledge level based on each student"s learning record, is a basic task in the AIEd field. However, there is a lack of utilization of the dataset and research on the KT model architecture. In this paper, we propose to use a total of 11 features, after trying various features related to the problems, and present a new model based on the self-attention mechanism with new query, key, and values, Self-Attentive Knowledge Tracking Extended (SANTE). In experiments, we confirm that the proposed method with the selected features outperforms the previous KT models in terms of the AUC value.
Kor-Eng NMT using Symbolization of Proper Nouns
Myungjin Kim, Junyeong Nam, Heeseok Jung, Heeyoul Choi
http://doi.org/10.5626/JOK.2021.48.10.1084
There is progress in the field of neural machine translation, but there are cases where the translation of sentences containing proper nouns, such as, names, new words, and words that are used only within a specific group, is not accurate. To handle such cases, this paper uses the Korean-English proper noun dictionary and the symbolization method in addition to the recently proposed translation model, Transformer Model. In the proposed method, some of the words in the sentences used for learning are symbolized using a proper noun dictionary, and the translation model is trained with sentences including the symbolized words. When translating a new sentence, the translation is completed by symbolizing, translation, and desymbolizing. The proposed method was compared with a model without symbolization, and for some cases improvement was quantitatively confirmed with the BLEU score. In addition, several examples of translation were also presented along with commercial service results.
Deep Neural Networks and End-to-End Learning for Audio Compression
Daniela N. Rim, Inseon Jang, Heeyoul Choi
http://doi.org/10.5626/JOK.2021.48.8.940
Recent advances in end-to-end deep learning have encouraged the exploration of tasks dealing with highly structured data using unified deep network models. The fabrication and design of such models for compressing audio signals has been a challenge due to the need for discrete representations that are not easy to train with end-to-end backpropagation. In this paper, we present an end-to-end deep learning approach that combines recurrent neural networks (RNNs) within the training strategy of variational autoencoders (VAEs) with a binary representation of the latent space. We apply a reparametrization trick for the Bernoulli distribution for the discrete representations, which allows smooth backpropagation. In addition, our approach enables the separation of the encoder and decoder, which is necessary for compression tasks. To the best of our knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.53dB.
Alpha-Integration Pooling for Convolutional Neural Networks
http://doi.org/10.5626/JOK.2021.48.7.774
Convolutional neural networks (CNNs) have achieved remarkable performance in many applications, especially in image recognition tasks. As a crucial component of CNNs, sub-sampling plays an important role for efficient training or invariance property, and max-pooling and arithmetic average-pooling are commonly used sub-sampling methods. In addition to the two pooling methods, however, there are many other pooling types, such as geometric average, harmonic average, among others. Since it is not easy for algorithms to find the best pooling method, usually the pooling types are predefined, which might not be optimal for different tasks. As other parameters in deep learning, however, the type of pooling can be driven by data for a given task. In this paper, we propose α-integration pooling (αI-pooling), which has a trainable parameter α to find the type of pooling. αI-pooling is a general pooling method including max-pooling and arithmetic average-pooling as a special case, depending on the parameter α. Experiments show that αI-pooling outperforms other pooling methods, in image recognition tasks. Also, it turns out that each layer has a different optimal pooling type.
Improvement in Network Intrusion Detection based on LSTM and Feature Embedding
Hyeokmin Gwon, Chungjun Lee, Rakun Keum, Heeyoul Choi
http://doi.org/10.5626/JOK.2021.48.4.418
Network Intrusion Detection System (NIDS) is an essential tool for network perimeter security. NIDS inspects network traffic packets to detect network intrusions. Most of the existing works have used machine learning techniques for building the system. While the reported works demonstrated the effectiveness of various artificial intelligence algorithms, only a few of them have utilized the time-series information of network traffic data. Also, categorical information of network traffic data has not been included in neural network-based approaches. In this paper, we propose network intrusion detection models based on sequential information using the long short-term memory (LSTM) network and categorical information using the embedding technique. We have conducted experiments using models with UNSW-NB15, which is a comprehensive network traffic dataset. The experiment results confirm that the proposed method improves the performance, with a binary classification accuracy rate of 99.72%.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr