Search : [ keyword: Transformer ] (28)

SASRec vs. BERT4Rec: Performance Analysis of Transformer-based Sequential Recommendation Models

Hye-young Kim, Mincheol Yoon, Jongwuk Lee

http://doi.org/10.5626/JOK.2024.51.4.352

Sequential recommender systems extract interests from user logs and use them to recommend items the user might like next. SASRec and BERT4Rec are widely used as representative sequential recommendation models. Existing studies have utilized these two models as baselines in various studies, but their performance is not consistent due to differences in experimental environments. This research compares and analyzes the performance of SASRec and BERT4Rec on six representative sequential recommendation datasets. The experimental result shows that the number of user-item interactions has the largest impact on BERT4Rec training, which in turn leads to the performance difference between the two models. Furthermore, this research finds that the two learning methods, which are widely utilized in sequential recommendation settings, can also have different effects depending on the popularity bias and sequence length. This shows that considering dataset characteristics is essential for improving recommendation performance.

Location-Dependent and Task-Oriented Power Allocation in Holographic MIMO: A Transformer-based Approach

Apurba Adhikary, Avi Deb Raha, Monishanker Halder, Mrityunjoy Gain, Ji Su Yoon, Seong Bae Park, Choong Seon Hong

http://doi.org/10.5626/JOK.2024.51.1.93

Future communication networks are expected to provide improved throughput data services with minimal power for beamforming. The location-dependent and task-oriented resource allocation approach for holographic beamforming ensures the improvement of the channel capacity for the users by activating the required number of grids from the holographic grid array. An optimization problem is obtained for maximizing the channel capacity considering the location and task priority of the users. In this study, a Transformer-based approach that allocates the required power for serving the users to generate holographic beamforming is proposed as the solution for the optimization problem. The simulation results demonstrate that the proposed location-dependent and task-oriented Transformer-based approach effectively allocate power for holographic beamforming to serve the users.

New Transformer Model to Generate Molecules for Drug Discovery

Yu-Bin Hong, Kyungjun Lee, DongNyenog Heo, Heeyoul Choi

http://doi.org/10.5626/JOK.2023.50.11.976

Among various generative models, recurrent neural networks (RNNs) based models have achieved state-of-the-art performance in the drug generation task. To overcome the long-term dependency problem that RNNs suffer from, Transformer-based models were proposed for the task. However, the Transformer models showed worse performances than the RNNs models in the drug generation task, and we believe it was because the Transformer models were over-parameterized with the over-fitting problem. To avoid the problem, in this paper, we propose a new Transformer model by replacing the large decoder with simple feed-forward layers. Experiments confirmed that our proposed model outperformed the previous state-of-the-art baseline in major evaluation metrics while preserving other minor metrics with a similar level of performance. Furthermore, when we applied our model to generate candidate molecules against SARs-CoV-2 (COVID-19) virus, the generated molecules were more effective than drugs in commercial market such as Paxlovid, Molnupiravir, and Remdesivir.

CLS Token Additional Embedding Method Using GASF and CNN for Transformer based Time Series Data Classification Tasks

Jaejin Seo, Sangwon Lee, Wonik Choi

http://doi.org/10.5626/JOK.2023.50.7.573

Time series data refer to a sequentially determined data set collected for a certain period of time. They are used for prediction, classification, and outlier detection. Although existing artificial intelligence models in the field of time series are mainly based on the Recurrent Neural Network, recent research trends are changing to transformer based models. Although these transformer based models show good performance for time series data prediction problem, they show relatively insufficient performance for classification tasks. In this paper, we propose an embedding method to add special classification tokens generated using Gramian Angular Summation Field and Convolution Neural Network to utilize time series data as input to transformers and found that we could leverage the pre-trained method to improve performance. To show the efficacy of our method, we conducted extensive experiments with 12 different models using the University of California, Riverside dataset. Experimental results show that our proposed model improved the average accuracy of 85 datasets from 1.4% to up to 21.1%.

Document-level Machine Translation Data Augmentation Using a Cluster Algorithm and NSP

Dokyoung Kim, Changki Lee

http://doi.org/10.5626/JOK.2023.50.5.401

In recent years, research on document level machine translation has been actively conducted to understand the context of the entire document and perform natural translation. Similar to the sentence-level machine translation model, a large amount of training data is required for training of the document-level machine translation model, but there is great difficulty in building a large amount of document-level parallel corpus. Therefore, in this paper, we propose a data augmentation technique effective for document-level machine translation in order to improve the lack of parallel corpus per document. As a result of the experiment, by applying the data augmentation technique using the cluster algorithm and NSP to the sentence unit parallel corpus without context, the performance of the document-level machine translation is improved by S-BLEU 3.0 and D-BLEU 2.7 compared to that before application of the data augmentation technique.

C++ based Deep Learning Open Source Framework WICWIU.v3 that Supports Natural Language and Time-series Data Processing

Junseok Oh, Chanhyo Lee, Okkyun Koo, Injung Kim

http://doi.org/10.5626/JOK.2023.50.4.313

WICWIU is the first open-source deep learning framework developed by Korean university. In this work, we developed WICWIU.v3 that includes features for natural language and time-series data processing. WICWIU was designed for C++ environment, and supports GPU-based parallel processing, and has excellent readability and extensibility, allowing users to easily add new features. In addition to WICWIU.v1 and v2 that focus on image processing models, such as convolutional neural networks (CNN) and general adversarial networks (GAN), WICWIU.v3 provides classes and functions for natural language and time-series data processing, such as recurrent neural networks (RNN), including LSTM (Long Short-Term Memory Networks) and GRU (Gated Recurrent Units), attention modules, and Transformers. We validated the newly added functions for natural language and time-series data by implementing a machine translator and a text generator with WICWIU.v3.

PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model

Yoonmin Lee, Taewook Hwang, Sangkeun Jung, Hyein Seo, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.4.306

Recent neural network search has enabled semantic search beyond search based on statistical methods, and finds accurate search results even with typos. This paper proposes a neural network-based patentQ&A search system that provides the closest answer to the user"s question intention when a general public without patent expertise searches for patent information using general terms. A patent dataset was constructed using patent customer consultation data posted on the Korean Intellectual Property Office website. Patent-KoBERT (Triplet) and Patent-KoBERT (CrossEntropy) were fine-tuned as patent datasets were used to extract similar questions to questions entered by the user and re-rank them. As a result of the experiment, values of Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) were 0.96, confirming that answers most similar to the intention of the user input were well selected.

Building a Parallel Corpus and Training Translation Models Between Luganda and English

Richard Kimera, Daniela N. Rim, Heeyoul Choi

http://doi.org/10.5626/JOK.2022.49.11.1009

Recently, neural machine translation (NMT) which has achieved great successes needs large datasets, so NMT is more premised on high-resource languages. This continuously underpins the low resource languages such as Luganda due to the lack of high-quality parallel corpora, so even ‘Google translate’ does not serve Luganda at the time of this writing. In this paper, we build a parallel corpus with 41,070 pairwise sentences for Luganda and English which is based on three different open-sourced corpora. Then, we train NMT models with hyper-parameter search on the dataset. Experiments gave us a BLEU score of 21.28 from Luganda to English and 17.47 from English to Luganda. Some translation examples show high quality of the translation. We believe that our model is the first Luganda-English NMT model. The bilingual dataset we built will be available to the public.

1×1 UWB-based Human Pose Estimation Using Transformer

Seunghyun Kim, Keunhong Chae, Seunghwan Shin, Yusung Kim

http://doi.org/10.5626/JOK.2022.49.4.298

The problem of estimating a human’s pose in specific space from an image is one of the main area of computer vision and is an important technology that can be used in various fields such as games, medical care, disaster, fire fighting, and the military. By combining with machine learning, the accuracy of pose estimation has been greatly improved. However, the image-based approach has a limitation in that it is difficult to estimate pose when part or whole of the body is occluded by obstacles or when the lighting is dark. Recently, studies have emerged to estimate a human pose using wireless signals, which have the advantage of penetrating obstacles without being affected by brightness. The previous stereotype was that two or more pairs of transceivers are required to estimate a specific location based on wireless signals. This paper shows that it is possible to estimate the human pose and to perform body segmentation by applying deep learning only with 1x1 ultra wide band signals collected by 1×1 transceiver. We also propose a method of replacing convolution neural networks and showing better performance through transformer models.

Aspect Summarization for Product Reviews based on Attention-based Aspect Extraction

Jun-Nyeong Jeong, Sang-Young Kim, Seong-Tae Kim, Jeong-Jae Lee, Yuchul Jung

http://doi.org/10.5626/JOK.2021.48.12.1318

Recently, document summaries such as articles and papers through machine learning and summary-related research on online reviews are active. In this study, unlike the existing simply summarizing content, a technique was developed for generating an aspect summary by considering various aspects of product reviews. By refining the earphone product review data crawled to build the learning data, 40,000 reviews were obtained. Moreover, we manually constructed 4,000 aspect summaries to be used for our training and evaluation tasks. In particular, we proposed a model that could summarize aspects using only text data using the aspect-based word expansion technique (ABAE). To judge the effectiveness of the proposed technique, we performed experiments according to the use of words related to aspects and the masking ratio during learning. As a result, it was confirmed that the model that randomly masked 25% of the words related to the aspect showed the highest performance, and during verification, the ROUGE was 0.696, and the BERTScore was 0.879.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr