Digital Library[ Search Result ]
Abstractive Summarization Corpus Construction of National Assembly Minutes and Model Development
Younggyun Hahm, Yejee Kang, Seoyoon Park, Yongbin Jeong, Hyunbin Seo, Yiseul Lee, Hyejin Seo, Saetbyol Seo, Hansaem Kim
http://doi.org/10.5626/JOK.2024.51.3.218
The mainstream of summary research has been targeting documents, but recently, interest in meeting summary research has significantly increased. As part of the National Institute of Korean Language’s big data construction project, a study on the summary of the National Assembly minutes, which have not yet been studied in Korea, was conducted and a summarization dataset for the National Assembly minutes was constructed. Qualitative intrinsic human evaluation was conducted to verify the quality of the constructed dataset. In addition, by conducting quantitative and qualitative evaluations of datasets built through the generative summarization model, the evaluation of the National Assembly Minutes Summarization dataset and the research direction of future generative and minutes summaries were sought.
Automatic Generation of Custom Advertisement Messages based on Literacy Styles of Classified Personality Types
Jimin Seong, Yunjong Choi, Doyeon Kwak, Hansaem Kim
http://doi.org/10.5626/JOK.2024.51.1.23
This study introduces a novel framework that defines marketing styles based on the MBTI personality types, and presents a machine learning technique to generate customized advertising messages aligned to these types. We use the BART algorithm to synthesize customized advertising content by training on the advertisement texts incorporating personality type prefixes. Our experiments confirm the model’s efficacy in transforming generic advertising copy into custom messages that embody the distinct style characteristics of each personality type, via prefix manipulation. Theoretically, our research establishes the relationship between style characteristics and personality types; practically, it provides the technique to fine-tune a language model to generate advertising messages that align with specific personality types. Moreover, this research serves as a foundational work for systematizing and replicating stylistic differences across various languages and regions.
KcBert-based Movie Review Corpus Emotion Analysis Using Emotion Vocabulary Dictionary
Yeonji Jang, Jiseon Choi, Hansaem Kim
http://doi.org/10.5626/JOK.2022.49.8.608
Emotion analysis is the classification of human emotions expressed in text data into various emotional types such as joy, sadness, anger, surprise, and fear. In this study, using the emotion vocabulary dictionary, the emotions expressed in the movie review corpus were classified into nine categories: joy, sadness, fear, anger, disgust, surprise, interest, boredom, and pain to construct an emotion corpus. Then, the performance of the model was evaluated by training the emotion corpus in KcBert. To build the emotion analysis corpus, an emotion vocabulary dictionary based on a psychological model was used. It was judged whether the vocabulary of the emotion vocabulary dictionary and the emotion vocabulary displayed in the movie review corpus matched, and the emotion type matching the vocabulary appearing at the end of the movie review corpus was tagged. Based on the performance of the emotion analysis corpus constructed in this way by training it on KcBert pre-trained with NSMC, KcBert showed excellent performance in the model classified into 9 types.
Korean Abstract Meaning Representation (AMR) Guidelines and Corpus for Graph-structured Meaning Representations
Hyonsu Choe, Jiyoon Han, Hyejin Park, Taehwan Oh, Seokwon Park, Hansaem Kim
http://doi.org/10.5626/JOK.2020.47.12.1134
This paper introduces the Korean Abstract Meaning Representation (AMR) Guideline v1.0. AMR is a graph-based meaning representation system and is one of the most significant frameworks for meaning representation. The Korean AMR Guideline is a product of the study that analyzed and localized the AMR Guideline 1.2.6 on the basis of the features of the Korean language has. The Korean AMR corpus can be used for implementation of semantic parser, which is the core of Natural Language Understanding technology, and can be used for NLU/NLG tasks such as Machine Reading Comprehension, Automatic Summarization. The Korean AMR Corpus built depending on this guideline comprises 896 sentences, or 10,414 words (eojeol) for now.
Unified Methodology of Multiple POS Taggers for Large-scale Korean Linguistic GS Set Construction
Tae-Young Kim, Pum-Mo Ryu, Hansaem Kim, Hyo-Jung Oh
http://doi.org/10.5626/JOK.2020.47.6.596
In recent years, there has been national support for constructing, sharing, and spreading a large-scale Korean linguistic GS set for Korean information processing. As part of the corpus construction project, this study proposes the methodology for constructing the Korean linguistic GS set using various Korean language analysis modules developed in Korea. To build a large-scale training set, we referred to automatic tagged candidate answers from the N-modules. We then minimized manual effort by classifying the error types from the candidate responses and semi- automatically correcting the major error types. In this study, we normalized results of the morphological analysis and constructed a large-scale Korean linguistic GS set based on the unified format U-POS. As a result of this study, 348,229 sentences, a total of 9,455,930 words, were constructed as the Korean linguistic GS set. This can be practically applied later as a basic training resource for Korean information processing.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr