Digital Library[ Search Result ]
KcBert-based Movie Review Corpus Emotion Analysis Using Emotion Vocabulary Dictionary
Yeonji Jang, Jiseon Choi, Hansaem Kim
http://doi.org/10.5626/JOK.2022.49.8.608
Emotion analysis is the classification of human emotions expressed in text data into various emotional types such as joy, sadness, anger, surprise, and fear. In this study, using the emotion vocabulary dictionary, the emotions expressed in the movie review corpus were classified into nine categories: joy, sadness, fear, anger, disgust, surprise, interest, boredom, and pain to construct an emotion corpus. Then, the performance of the model was evaluated by training the emotion corpus in KcBert. To build the emotion analysis corpus, an emotion vocabulary dictionary based on a psychological model was used. It was judged whether the vocabulary of the emotion vocabulary dictionary and the emotion vocabulary displayed in the movie review corpus matched, and the emotion type matching the vocabulary appearing at the end of the movie review corpus was tagged. Based on the performance of the emotion analysis corpus constructed in this way by training it on KcBert pre-trained with NSMC, KcBert showed excellent performance in the model classified into 9 types.
PrefixLM for Korean Text Summarization
Kun-Hui Lee, Seung-Hoon Na, Joon-Ho Lim, Tae-Hyeong Kim, Du-Seong Chang
http://doi.org/10.5626/JOK.2022.49.6.475
In this paper, we examine the effectiveness of PrefixLM that consists of half of the parameters of the T5"s encoder-decoder architecture for Korean text generation tasks. Different from T5 where input and output sequences are separately provided, the transformer block of PrefixLM takes a single sequence that concatenates both input and output sequences. By designing the attention mask, PrefixLM performs uni- and bi-directional attentions on input and output sequences, respectively, thereby enabling to perform two roles of encoder and decoder with a single transformer block. Experiment results on Korean abstractive document summarization task show that PrefixLM leads to performance increases of 2.17 and 2.78 more than 2 in Rouge-F1 score over BART and T5, respectively, implying that the PrefixLM is promising in Korean text generation tasks.
Analyzing the Effect of the Twitter Corpus Selection on the Accuracy of Smartwatch Text Entry
http://doi.org/10.5626/JOK.2022.49.4.321
When a statistical decoder is used to support text entry on a smartwatch, fast and accurate typing is possible. In this paper, we analyzed the effect of a corpus, which is used to construct a language model necessary to implement the autocorrect function, on the accuracy of character input. Language models are based on the Brown corpus, which consists of text of various genres, and the Twitter corpus, extracted from tweet messages. We constructed a statistical decoder for the autocorrect function of the text entry using the two language models, and we simulated user touch input with the dual Gaussian distribution on the smartwatch keyboard to input Enron mobile phrases, composed of phrases written on real mobile devices. The test result shows that the average character error rate (CER) of the Brown corpus and the Twitter corpus is 8.35% and 6.44%, respectively, confirming a statistically significant difference.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr