Journal of KIISE

Search : [ keyword: BERT ] (24)

News articles provide information on various topics such as politics, economics, society, and culture. Their neutral tone often limits the ability of traditional sentiment analysis models to effectively capture emotions. To address this issue, we proposed a novel sentiment analysis model that combined quotations with article attribute values. For sentiment analysis, we employed deep learning-based models such as BERT, KoBERT (optimized for Korean), and KLUE. Embedding results from these models were integrated using a Mixture of Experts (MoE) structure to simultaneously learn the emotional information in quotations and the attribute information of articles. Experimental results demonstrated that the proposed models, including the attribute-based phrase and attribute group embedding models, achieved higher accuracy and reliability than conventional quotation-only analysis and traditional machine learning models. In particular, the KLUE model optimized for Korean data showed improved performance. Incorporating diverse attribute information significantly enhanced predictive accuracies of sentiment analysis models. These findings suggest that effectively combining quotation data with article attribute information enables more sophisticated sentiment analysis, even for neutral news articles.

A Study on Improving the Accuracy of Korean Speech Recognition Texts Using KcBERT

Donguk Min, Seungsoo Nam, Daeseon Choi

http://doi.org/10.5626/JOK.2024.51.12.1115

In the field of speech recognition, models such as Whisper, Wav2Vec2.0, and Google STT are widely utilized. However, Korean speech recognition faces challenges because complex phonological rules and diverse pronunciation variations hinder performance improvements. To address these issues, this study proposed a method that combined the Whisper model with a post-processing approach using KcBERT. By applying KcBERT’s bidirectional contextual learning to text generated by the Whisper model, the proposed method could enhance contextual coherence and refine the text for greater naturalness. Experimental results showed that post-processing reduced the Character Error Rate (CER) from 5.12% to 1.88% in clean environments and from 22.65% to 10.17% in noisy environments. Furthermore, the Word Error Rate (WER) was significantly improved, decreasing from 13.29% to 2.71% in clean settings and from 38.98% to 11.15% in noisy settings. BERTScore also exhibited overall improvement. These results demonstrate that the proposed approach is effective in addressing complex phonological rules and maintaining text coherence within Korean speech recognition.

A Study on Development Method for BERT-based False Alarm Classification Model in Weapon System Software Static Test

Hyoju Nam, Insub Lee, Namhoon Jung, Seongyun Jeong, Kyutae Cho, Sungkyu Noh

http://doi.org/10.5626/JOK.2024.51.7.620

Recently, as the size and complexity of software in weapon systems have increased, securing the reliability and stability is required. To achieve this, developers perform static and dynamic reliability testing during development. However, a lot of false alarms occur in static testing progress that cause wasting resources such as time and cost for reconsider them. Recent studies have tried to solve this problem by using models such as SVM and LSTM. However, they have a critical limitation in that these models do not reflect correlation between defect code line and other lines since they use Word2Vec-based code embedding or only code information. The BERT-based model learns the front-to-back relationship between sentences through the application of a bidirectional transformer. Therefore, it can be used to classify false alarms by analyzing the relationship between code. In this paper, we proposed a method for developing a false alarm classification model using a BERT-based model to efficiently analyze static test results. We demonstrated the ability of the proposed method to generate a dataset in a development environment and showed the superiority of our model.

Gender Classification Model Based on Colloquial Text in Korean for Author Profiling of Messenger Data

Jihye Kang, Minho Kim, Hyuk-Chul Kwon

http://doi.org/10.5626/JOK.2023.50.12.1063

With explosive social network services (SNS) growth, there has been an extensive generation of text data through messenger services. In addition, various applications such as Sentiment Analysis, Abusive text Detection, and Chatbot have been developed and provided due to the recent development of Natural Language Processing. However, there has not been an attempt to classify various characteristics of authors such as the gender and age of speakers in Korean colloquial texts. In this study, I propose a gender classification model for author profiling using Korean colloquial texts. Based on Kakao Talk data for the gender classification of the speaker, the Domain Adaptation is carried out by additionally learning ‘Nate Pan’ data to KcBERT(Korean Comments BERT) which is learned by Korean comments. Results of experimenting with a model that combines External Lexical Information showed that the performance was improved by achieving an accuracy of approximately 95%. In this study, the self-collected ‘Nate Pan’ data and the "daily conversation" data provided by the National Institute of the Korean Language were used for domain adaptation, and the ‘Korean SNS’ data of AI HUB was used for model learning and evaluation.

Review-based Personalized Recommendation System using Effective Personalized Fusion and BERT

Heejin Kook, Youhyun Shin

http://doi.org/10.5626/JOK.2023.50.8.646

Generally, review texts contain personal information from users, and reviews written by users can have different meanings, even if they use the exact wording. These review features can be used to compensate for the shortcomings of collaborative filtering, which is vulnerable to data sparsity. They can also be used as information for personalized recommendation systems. Despite the success of pre-trained language models in natural language processing, there has been little research on personalized recommendation systems that leverage BERT to enrich individual user features from reviews. In this work, we propose a rating prediction model that uses BERT for detailed learning of user and item-specific features from reviews and tightly combine them with user and product IDs to represent personalized user and item. Experiments results show that the proposed model can achieve improved performance over the baseline on the Amazon benchmark dataset.

A Model for Topic Classification and Extraction of Sentimental Expression using a Lexical Semantic Network

JiEun Park, JuSang Lee, JoonChoul Shin, ChoelYoung Ock

http://doi.org/10.5626/JOK.2023.50.8.700

The majority of the previous sentiment analysis studies classified a single sentence or document into only a single sentiment. However, more than one sentiment can exist in one sentence. In this paper, we propose a method that extracts sentimental expression for word units. The structure of the proposed model is a UBERT model that uses morphologically analyzed sentences as input and adds layers to predict topic classification and sentimental expression. The proposed model uses topic feature of a sentence predicted by topic dictionary. The topic dictionary is built at the beginning of machine learning. The learning module collects topic words from a training corpus and expands them using the lexical semantic network. The evaluation is performed with the word unit F1-Score. The proposed model achieves an F1-Score of 58.19%, an improvement of 0.97% point over the baseline.

Multi-Document Summarization Use Semantic Similarity and Information Quantity of Sentence

Yeon-Soo Lim, Sunggoo Kwon, Bong-Min Kim, Seong-Bae Park

http://doi.org/10.5626/JOK.2023.50.7.561

Document summarization task has recently emerged as an important task in natural language processing because of the need for delivering concise information. However, it is difficult to obtain a suitable multi-document summarization dataset. In this paper, rather than training with a multi-document summarization dataset, we propose to use a single-document summarization dataset. That is, we propose a multi-document summarization model which generates multiple single-document summaries with a single-document summarization model and then post-processes these summaries. The proposed model consists of three modules: a summary module, a similarity module, and an information module. When multiple documents are entered into the proposed model, the summary module generates summaries of every single document. The similarity module clusters similar summaries by measuring semantic similarity. The information module selects the most informative summary from each similar summary group and collects selected summaries for the final multi-document summary. Experimental results show that the proposed model outperforms the baseline models and it can generate a high-quality multi-document summary. In addition, the performances of each module also show meaningful results.

Korean Coreference Resolution through BERT Embedding at the Morpheme Level

Kyeongbin Jo, Yohan Choi, Changki Lee, Jihee Ryu, Joonho Lim

http://doi.org/10.5626/JOK.2023.50.6.495

Coreference resolution is a natural language processing task that identifies mentions that are subject to coreference resolution in a given document, and finds and groups the mentions that refer to the same entity. Korean coreference resolution has been mainly studied in an end-to-end method, and for this purpose, all spans must be considered as potential mentions, so memory usage and time complexity increase. In this paper, a word-level coreference resolution model that performs coreference resolution by mapping sub-tokens back to word units was applied to Korean, and the token expression of the word-level coreference resolution model is calculated through CorefBERT to reflect Korean characteristics. After that, entity name and dependency parsing features were added. As a result of the experiment, in the ETRI Q&A domain evaluation set, F1 was 70.68%, showing a 1.67% performance improvement compared to the existing end-to-end cross-reference solving model, Memory usage improved by 2.4 times, and speed increased by 1.82 times.

PGB: Permutation and Grouping for BERT Pruning

Hye-Min Lim, Dong-Wan Choi

http://doi.org/10.5626/JOK.2023.50.6.503

Recently, pre-trained Transformer-based models have been actively used for various artificial intelligence tasks, such as natural language processing and image recognition. However, these models have billions of parameters, which require significant computation for inference, and may be subject to many limitations for use in resource-limited environments. To address this problem, we propose PGB(Permutation Grouped BERT pruning), a new group-based structured pruning method for Transformer models. PGB effectively finds a way to change the optimal attention order according to resource constraints, and prunes unnecessary heads based on the importance of the heads to minimize the information loss in the model. Through various comparison experiments, PGB shows better performance in terms of inference speed and accuracy loss than the other existing structured pruning methods for the pre-trained BERT model.

PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model

Yoonmin Lee, Taewook Hwang, Sangkeun Jung, Hyein Seo, Yoonhyung Roh

http://doi.org/10.5626/JOK.2023.50.4.306

Recent neural network search has enabled semantic search beyond search based on statistical methods, and finds accurate search results even with typos. This paper proposes a neural network-based patentQ&A search system that provides the closest answer to the user"s question intention when a general public without patent expertise searches for patent information using general terms. A patent dataset was constructed using patent customer consultation data posted on the Korean Intellectual Property Office website. Patent-KoBERT (Triplet) and Patent-KoBERT (CrossEntropy) were fine-tuned as patent datasets were used to extract similar questions to questions entered by the user and re-rank them. As a result of the experiment, values of Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) were 0.96, confirming that answers most similar to the intention of the user input were well selected.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Tailored Sentiment Analysis of Economic News Based on a Mixture of Quotation and Attribute Encoders

A Study on Improving the Accuracy of Korean Speech Recognition Texts Using KcBERT

A Study on Development Method for BERT-based False Alarm Classification Model in Weapon System Software Static Test

Gender Classification Model Based on Colloquial Text in Korean for Author Profiling of Messenger Data

Review-based Personalized Recommendation System using Effective Personalized Fusion and BERT

A Model for Topic Classification and Extraction of Sentimental Expression using a Lexical Semantic Network

Multi-Document Summarization Use Semantic Similarity and Information Quantity of Sentence

Korean Coreference Resolution through BERT Embedding at the Morpheme Level

PGB: Permutation and Grouping for BERT Pruning

PatentQ&A: Proposal of Patent Q&A Neural Search System Using Transformer Model

Search

Editorial Office