Journal of KIISE

Search : [ author: Youngjoong Ko ] (22)

SyllaBERT: A Syllable-Based Efficient Robust Transformer Model for Real-World Noise and Typographical Errors

http://doi.org/10.5626/JOK.2025.52.3.250

Training a Korean language model necessitates the development of a tokenizer specifically designed for the unique features of the Korean language, making this a crucial step in the modeling process. Most current language models utilize morpheme-based or subword-based tokenization. While these approaches work well with clean Korean text data, they are prone to out-of-vocabulary (OOV) issues due to abbreviations and neologisms frequently encountered in real-world Korean data. Moreover, actual Korean text often contains various typos and non-standard expressions, to which traditional morpheme-based or subword-based tokenizers are not sufficiently robust. To tackle these challenges, this paper introduces the SyllaBERT model, which employs syllable-level tokenization to effectively address the specific characteristics of Korean, even in noisy and non-standard contexts, with minimal resources. A compact syllable-level vocabulary was created, and a syllable-based language model was developed by reducing the embedding and hidden layer sizes of existing models. Experimental results show that, despite having approximately four times fewer parameters than subword-based models, the SyllaBERT model outperforms them in natural language understanding tasks on real-world conversational Korean data that includes noise.

Explainable Video Search System using Token Space-based Representation

Jiyeol Park, Dooyoung Kim, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.12.1068

Query-video retrieval is a field that finds the most relevant video to the query input by the user. For this, existing studies have presented the query and video in an appropriate latent vector space. However, the method of calculating the relevance between the query and the video simply uses the dot product of the two vectors without implying the meaning or explainability. In this paper, we propose a model that converts the query and video into embeddings located in a token-based space, searches the video like a document, and calculates semantic similarity. Experimental results show that the performance of the final model proposed in this paper is improved in Recall@1, Recall@5, and Recall@10 compared to baseline on MSVD dataset. Furthermore, the proposed model is approximately 3.33 times faster than CLIP4Clip. When applying BM25 with minimal modifications, it achieves a speedup of about 208.11 times. Additionally, qualitative evaluations demonstrate that tokens extracted from videos exhibit relevance comparable to subtitles, proving an explainability-based structure.

Korean Dependency Parsing Using Sequence Labeling

Keunha Kim, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.12.1053

Dependency parsing is a crucial step in language analysis. It identifies relationships between words within a sentence. Recently, many models based on a pre-trained transformer have shown impressive performances in various natural language processing research. hey have been also applied to dependency parsing. Generally, traditional approaches to dependency parsing using pre-trained models consist of two main stages: 1) merging token-level embeddings generated by the pre-trained model into word-level embeddings; and 2) analyzing dependency relations by comparing or classifying the merged embeddings. However, due to a large number of parameters and additional layers required for embedding construction, comparison, and classification, these models can be inefficient in terms of time and memory usage. This paper proposes a dependency parsing technique based on sequential labeling to improve the efficiency of training and inference by defining dependency parsing units and simplifying model layers. The proposed model eliminates the necessity of the word-level embedding merging step by utilizing special tokens to define parsing units. It also effectively reduces the number of parameters by simplifying model layers. As a result, the training and inference time is significantly shortened. With these optimizations, the proposed model maintains meaningful performance in dependency parsing.

Topic-Aware Cross-Attention for Dialogue Summarization

Suyoung Min, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.11.1011

Unlike general document summarization, dialogue summarization frequently involves informal and colloquial language. It requires an understanding of the context and flow of the dialogue. It also requires consideration of topics. This study proposes a Topic-Aware Cross-Attention mechanism that can incorporate elements to recognize topic distributions into a cross-attention mechanism to reflect characteristics of dialogue. This Topic-Aware Cross-Attention mechanism can extract topic distributions of dialogue and summary and apply the similarity of these distributions to the cross-attention mechanism within BART model’s decoder to perform dialogue summarization. The proposed Topic-Aware Cross-Attention mechanism can adjust application degree of topic distribution similarity to the cross-attention mechanism by modifying topic-ratio. Experimental results on DialogSum and SAMSum datasets demonstrated the suitability of the method for dialogue summarization.

Task-Oriented Dialogue System Using a Fusion Module between Knowledge Graphs

Jinyoung Kim, Hyunmook Cha, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.10.882

The field of Task-Oriented Dialogue Systems focuses on using natural language processing to assist users in achieving specific tasks through conversation. Recently, transformer-based pre-trained language models have been employed to enhance performances of task-oriented dialogue systems. This paper proposes a response generation model based on Graph Attention Networks (GAT) to integrate external knowledge data into transformer-based language models for more specialized responses in dialogue systems. Additionally, we extend this research to incorporate information from multiple graphs, leveraging information from more than two graphs. We also collected and refined dialogue data based on music domain knowledge base to evaluate the proposed model. The collected dialogue dataset consisted of 2,076 dialogues and 226,823 triples. In experiments, the proposed model showed a performance improvement of 13.83%p in ROUGE-1, 8.26%p in ROUGE-2, and 13.5%p in ROUGE-L compared to the baseline KoBART model on the proposed dialogue dataset.

Generating Relation Descriptions with Large Language Model for Link Prediction

Hyunmook Cha, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.10.908

The Knowledge Graph is a network consisting of entities and the relations between them. It is used for various natural language processing tasks. One specific task related to the Knowledge Graph is Knowledge Graph Completion, which involves reasoning with known facts in the graph and automatically inferring missing links. In order to tackle this task, studies have been conducted on both link prediction and relation prediction. Recently, there has been significant interest in a dual-encoder architecture that utilizes textual information. However, the dataset for link prediction only provides descriptions for entities, not for relations. As a result, the model heavily relies on descriptions for entities. To address this issue, we utilized a large language model called GPT-3.5-turbo to generate relation descriptions. This allows the baseline model to be trained with more comprehensive relation information. Moreover, the relation descriptions generated by our proposed method are expected to improve the performance of other language model-based link prediction models. The evaluation results for link prediction demonstrate that our proposed method outperforms the baseline model on various datasets, including Korean ConceptNet, WN18RR, FB15k-237, and YAGO3-10. Specifically, we observed improvements of 0.34%p, 0.11%p, 0.12%p, and 0.41%p in terms of Mean Reciprocal Rank (MRR), respecitvely.

Expected Addressee and Target Utterance Prediction for Construction of Multi-Party Dialogue Systems

Yoonjin Jang, Keunha Kim, Youngjoong Ko

http://doi.org/10.5626/JOK.2024.51.10.918

As the number of communication channels between people has increased in recent years, there has been a rise in both multi-party conversations and one-to-one conversations. Research on analyzing multi-party conversations has also been active. In the past, models for analyzing such dialogues typically predicted the addressee of the final response based on the previous responses. However, this differs from the task of generating multi-party dialogue responses, which requires the speaker to select the addressee to whom they will respond. In this paper, we propose a new task for predicting the addressee of a multi-party dialogue that does not rely on response information. Our task aims to predict and match the expected target utterance with the expected addressee in a real multi-party dialogue. To accomplish this, we introduce a model that uses a transform encoder-based masked token prediction learning method. This model predicts the expected target utterance and the expected addressee of the current speaker based on the previous dialogue context, without considering the final response. The proposed model achieves an accuracy of 82% in predicting the expected recipient and 68% in predicting the expected target utterance accuracy on the Ubuntu IRC dataset. These results demonstrate the potential of our model for use in a multi-party dialogue system, as it can accurately predict the target utterance that should be used. Moving forward, we plan to expand our research by creating additional datasets for multi-party dialogues and applying them to real-world multilateral dialogue response generation systems.

Multi-task Learning Based Re-ranker for External Knowledge Retrieval in Document-grounded Dialogue Systems

Honghee Lee, Youngjoong Ko

http://doi.org/10.5626/JOK.2023.50.7.606

Document-grounded dialogue systems retrieve external passages related to the dialogue and use them to generate an appropriate response to the user"s utterance. However, the retriever based on the dual-encoder architecture records low performance in finding relevant passages, and the re-ranker to complement the retriever is not sufficiently optimized. In this paper, to solve these problems and perform effective external passage retrieval, we propose a re-ranker based on multi-task learning. The proposed model is a cross-encoder structure that simultaneously learns contrastive learning-based ranking, Masked Language Model (MLM), and Posterior Differential Regularization (PDR) in the fine-tuning stage, enhancing language understanding ability and robustness of the model through auxiliary tasks of MLM and PDR. Evaluation results on the Multidoc2dial dataset show that the proposed model outperforms the baseline model in Recall@1, Recall@5, and Recall@10.

Video Retrieval System Using One-to-One Relation Between Clip-Sentence Sequence

Dooyoung Kim, Youngjoong Ko

http://doi.org/10.5626/JOK.2023.50.6.476

Video retrieval is a research field that finds videos related to text queries among candidate videos. The previous studies on video retrieval studies have used the learning methods that enforce the embeddings of a text and its paried video to be similar to each other without considering the structures of video and text. In this paper, we propose a novel video retrieval model and a training technique that focus on a pair of a clip sequence and a sentence sequence with a one-to-one relationship. Experimental results show that the performance of the proposed model is improved by 0.3%p in R@1 for sentence-clip retrieval and 5.4%p R@1 for paragraph-video retrieval on YouCook2 datasets compared to baseline models.

Entity Graph Based Dialogue State Tracking Model with Data Collection and Augmentation for Spoken Conversation

Haeun Yu, Youngjoong Ko

http://doi.org/10.5626/JOK.2022.49.10.891

As a part of a task-oriented dialogue system, dialogue state tracking is a task for understanding the dialogue and extracting user’s need in a slot-value form. Recently, Dialogue System Track Challenge (DSTC) 10 Track 2 initiated a challenge to measure the robustness of a dialogue state tracking model in a spoken conversation setting. The released evaluation dataset has three characteristics: new multiple value scenario, three-times more entities, and utterances from automatic speech recognition module. In this paper, to ensure the model’s robust performance, we introduce an extraction-based dialogue state tracking model with entity graph. We also propose to use data collection and template-based data augmentation method. Evaluation results prove that our proposed method improves the performance of the extraction-based dialogue state tracking model by 1.7% of JGA and 0.57% of slot accuracy compared to baseline model.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Digital Library[ Search Result ]

Search

Editorial Office