Digital Library[ Search Result ]
The Dataset and a Pretrained Language Model for Sentence Classification in Korean Science and Technology Abstracts
Hongbi Ahn, Soyoung Park, Yuchul Jung
http://doi.org/10.5626/JOK.2023.50.6.468
Classifying each sentence according to its role or function is a critical task, particularly in science and technology papers where abstracts contain various types of research-related content. Proper content curation and appropriate meaning tags are necessary but challenging due to the complexity and diversity of the work. For instance, in biomedical-related abstract data (such as PubMed) in foreign languages, the sentences in the abstract typically follow a consistent semantic sequence, such as background-purpose-method-result-conclusion. However, in Korean paper abstracts, the sentences are described in different orders depending on the author. To address this, we have constructed a dataset (PubKorSci-1k) that tags each sentence according to its role in the abstracts of the science and technology domains described in Korean. Additionally, we propose a learning technique for sentence classification based on this dataset.
Solving Korean Math Word Problems Using the Graph and Tree Structure
Kwang Ho Bae, Sang Yeop Yeo, Yu Chul Jung
http://doi.org/10.5626/JOK.2022.49.11.972
In previous studies, there have been various efforts to solve math word problems in the English sentence system. In many studies, improved performance was achieved by introducing structures such as trees and graphs, beyond the Sequence-to-Sequence approaches. However, in the study of solving math problems in Korean sentence systems, there are no model cases, using structures such as trees or graphs. Thus, in this paper, we examine the possibility of solving math problems in Korean sentence systems for models using the tree structure, graph structure, and Korean pre-training language models together. Our experimental results showed that accuracy improved by approximately 20%, compared to the model of the Seq2seq structure, by introducing the graph and tree structure. Additionally, the use of the Korean pre-training language model showed an accuracy improvement of 4.66%-5.96%.
Aspect Summarization for Product Reviews based on Attention-based Aspect Extraction
Jun-Nyeong Jeong, Sang-Young Kim, Seong-Tae Kim, Jeong-Jae Lee, Yuchul Jung
http://doi.org/10.5626/JOK.2021.48.12.1318
Recently, document summaries such as articles and papers through machine learning and summary-related research on online reviews are active. In this study, unlike the existing simply summarizing content, a technique was developed for generating an aspect summary by considering various aspects of product reviews. By refining the earphone product review data crawled to build the learning data, 40,000 reviews were obtained. Moreover, we manually constructed 4,000 aspect summaries to be used for our training and evaluation tasks. In particular, we proposed a model that could summarize aspects using only text data using the aspect-based word expansion technique (ABAE). To judge the effectiveness of the proposed technique, we performed experiments according to the use of words related to aspects and the masking ratio during learning. As a result, it was confirmed that the model that randomly masked 25% of the words related to the aspect showed the highest performance, and during verification, the ROUGE was 0.696, and the BERTScore was 0.879.
A Malicious Traffic Detection Method Using X-means Clustering
Myoungji Han, Jihyuk Lim, Junyong Choi, Hyunjoon Kim, Jungjoo Seo, Cheol Yu, Sung-Ryul Kim, Kunsoo Park
Malicious traffic, such as DDoS attack and botnet communications, refers to traffic that is generated for the purpose of disturbing internet networks or harming certain networks, servers, or hosts. As malicious traffic has been constantly evolving in terms of both quality and quantity, there have been many researches fighting against it. In this paper, we propose an effective malicious traffic detection method that exploits the X-means clustering algorithm. We also suggest how to analyze statistical characteristics of malicious traffic and to define metrics that are used when clustering. Finally, we verify effectiveness of our method by experiments with two released traffic data.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr