Journal of KIISE

Search : [ keyword: 머신 러닝 ] (5)

Document summarization task has recently emerged as an important task in natural language processing because of the need for delivering concise information. However, it is difficult to obtain a suitable multi-document summarization dataset. In this paper, rather than training with a multi-document summarization dataset, we propose to use a single-document summarization dataset. That is, we propose a multi-document summarization model which generates multiple single-document summaries with a single-document summarization model and then post-processes these summaries. The proposed model consists of three modules: a summary module, a similarity module, and an information module. When multiple documents are entered into the proposed model, the summary module generates summaries of every single document. The similarity module clusters similar summaries by measuring semantic similarity. The information module selects the most informative summary from each similar summary group and collects selected summaries for the final multi-document summary. Experimental results show that the proposed model outperforms the baseline models and it can generate a high-quality multi-document summary. In addition, the performances of each module also show meaningful results.

Topic Centric Korean Text Summarization using Attribute Model

Su-Hwan Yoon, A-Yeong Kim, Seong-Bae Park

http://doi.org/10.5626/JOK.2021.48.6.688

Abstractive summarization takes original text as an input and generates a summary containing the core-information about the original text. The abstractive summarization model is mainly designed by the Sequence-to-Sequence model. To improve quality as well as coherence of summary, the topic-centric methods which contain the core information of the original text are recently proposed. However, the previous methods perform additional training steps which make it difficult to take advantage of the pre-trained language model. This paper proposes a topic-centric summarizer that can reflect topic words to a summary as well as retain the characteristics of language model by using PPLM. The proposed method does not require any additional training. To prove the effectiveness of the proposed summarizer, this paper performed summarization experiments with Korean newspaper data.

Autoencoder-based Learning Contribution Measurement Method for Training Data Selection

Yuna Jeong, Myunggwon Hwang, Wonkyung Sung

http://doi.org/10.5626/JOK.2021.48.2.195

Despite recent significant performance improvements, the iterative process of machine-learning algorithms makes development and utilization difficult and time-consuming. In this paper, we present a data-selection method that reduces the time required by providing an approximate solution . First, data are mapped to a feature vector in latent space based on an Autoencoder, with high weight given to data with high learning contribution that are relatively difficult to learn. Finally, data are ranked and selected based on weight and used for training. Experimental results showed that the proposed method selected data that achieve higher performance than random sampling.

Space Efficient Top-k Query Encoding Based on Data Distribution

Wooyoung Park, Srinivasa Rao Satti

http://doi.org/10.5626/JOK.2020.47.3.235

We consider an encoding that supports a range top-k query on a two-dimensional array without accessing the original array. We propose a more space-efficient encoding method for top-k query with better average-case query time. Our experiments also show that our encoding is more space-efficient than the earlier ones. Also, based on the learning-based data structure, we propose the use of the learning-based data structure on succinct data structures.

A Study on Two-dimensional Array-based Technology to Identify Obfuscatied Malware

Seonbin Hwang, Hogyeong Kim, Junho Hwang, Taejin Lee

http://doi.org/10.5626/JOK.2018.45.8.769

More than 1.6 milion types of malware are emerging on average per day, and most cyber attackes are generated by malware. Moreover, malware obfuscation techniques are becoming more intelligent through packing or encryption to prevent reverse engineering analysis. In the case of static analysis, there is a limit to the analysis when the analytical file becomes obfuscated, and a countermeasure is needed. In this paper, we propose an approach based on String, Symbol, and Entropy as a way to identify malware even during obfuscation. Two-dimensional arrays were applied for fixed feature-set processing as well as non-fixed feature-set processing, and 15,000 malware/benign samples were tested using the Deep Neural Network. This study is expected to operate in a complementary manner in conjunction with various malicious code detection methods in the future, and it is expected that it can be utilized in the analysis of obfuscated malware variants.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Multi-Document Summarization Use Semantic Similarity and Information Quantity of Sentence

Topic Centric Korean Text Summarization using Attribute Model

Autoencoder-based Learning Contribution Measurement Method for Training Data Selection

Space Efficient Top-k Query Encoding Based on Data Distribution

A Study on Two-dimensional Array-based Technology to Identify Obfuscatied Malware

Search

Editorial Office