Digital Library[ Search Result ]
Multi-Document Summarization Use Semantic Similarity and Information Quantity of Sentence
Yeon-Soo Lim, Sunggoo Kwon, Bong-Min Kim, Seong-Bae Park
http://doi.org/10.5626/JOK.2023.50.7.561
Document summarization task has recently emerged as an important task in natural language processing because of the need for delivering concise information. However, it is difficult to obtain a suitable multi-document summarization dataset. In this paper, rather than training with a multi-document summarization dataset, we propose to use a single-document summarization dataset. That is, we propose a multi-document summarization model which generates multiple single-document summaries with a single-document summarization model and then post-processes these summaries. The proposed model consists of three modules: a summary module, a similarity module, and an information module. When multiple documents are entered into the proposed model, the summary module generates summaries of every single document. The similarity module clusters similar summaries by measuring semantic similarity. The information module selects the most informative summary from each similar summary group and collects selected summaries for the final multi-document summary. Experimental results show that the proposed model outperforms the baseline models and it can generate a high-quality multi-document summary. In addition, the performances of each module also show meaningful results.
Noise Injection for Natural Language Sentence Generation from Knowledge Base
http://doi.org/10.5626/JOK.2020.47.10.965
Generating a natural language sentence from Knowledge base is an operation of entering a triple in the Knowledge base to generate triple information, which is a natural language sentence containing the relationship between the entities. To solve the task of generating sentences from triples using a deep neural network, learning data consisting of many pairs of triples and natural language sentences are required. However, it is difficult to learn the model because the learning data composed in Korean is not yet released. To solve the deficiency of learning data, this paper proposes an unsupervised learning method that extracts keywords based on Korean Wikipedia sentence data and generates learning data using a noise injection technique. To evaluate the proposed method, we used gold-standard dataset produced by triples and sentence pairs. Consequently, the proposed noise injection method showed superior performances over normal unsupervised learning on various evaluation metrics including automatic and human evaluations.
Multi-sense Word Embedding to Improve Performance of a CNN-based Relation Extraction Model
Sangha Nam, Kijong Han, Eun-kyung Kim, Sunggoo Kwon, Yoosung Jung, Key-Sun Choi
http://doi.org/10.5626/JOK.2018.45.8.816
The relation extraction task is to classify a relation between two entities in an input sentence and is important in natural language processing and knowledge extraction. Many studies have designed a relation extraction model using a distant supervision method. Recently the deep-learning based relation extraction model became mainstream such as CNN or RNN. However, the existing studies do not solve the homograph problem of word embedding used as an input of the model. Therefore, model learning proceeds with a single embedding value of homogeneous terms having different meanings; that is, the relation extraction model is learned without grasping the meaning of a word accurately. In this paper, we propose a relation extraction model using multi-sense word embedding. In order to learn multi-sense word embedding, we used a word sense disambiguation module based on the CoreNet concept, and the relation extraction model used CNN and PCNN models to learn key words in sentences.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr