Search : [ keyword: Sentence Generation ] (4)

Noise Injection for Natural Language Sentence Generation from Knowledge Base

Sunggoo Kwon, Seyoung Park

http://doi.org/10.5626/JOK.2020.47.10.965

Generating a natural language sentence from Knowledge base is an operation of entering a triple in the Knowledge base to generate triple information, which is a natural language sentence containing the relationship between the entities. To solve the task of generating sentences from triples using a deep neural network, learning data consisting of many pairs of triples and natural language sentences are required. However, it is difficult to learn the model because the learning data composed in Korean is not yet released. To solve the deficiency of learning data, this paper proposes an unsupervised learning method that extracts keywords based on Korean Wikipedia sentence data and generates learning data using a noise injection technique. To evaluate the proposed method, we used gold-standard dataset produced by triples and sentence pairs. Consequently, the proposed noise injection method showed superior performances over normal unsupervised learning on various evaluation metrics including automatic and human evaluations.

Sentence Generation from Knowledge Base Triples Using Attention Mechanism Encoder-decoder

Garam Choi, Sung-Pil Choi

http://doi.org/10.5626/JOK.2019.46.9.934

In this paper, we have investigated the generation of natural language sentences by using Knowledge Base Triples data with a structured structure. In order to generate a sentence that expresses the triple, a LSTM (Long Short-term Memory Network) encoder-decoder structure is used along with an Attention Mechanism. The BLEU score and ROUGE score for the test data were 42.264 (BLEU-1), 32.441 (BLEU-2), 26.820 (BLEU-3), 24.446 (BLEU-4), and 47.341 and 0.8% (based on BLEU-1) for the data comparison model. In addition, the average of the top 10 test data BLEU scores was recorded as 99.393 (BLEU-1).

Application of Improved Variational Recurrent Auto-Encoder for Korean Sentence Generation

Sangchul Hahn, Seokjin Hong, Heeyoul Choi

http://doi.org/10.5626/JOK.2018.45.2.157

Due to the revolutionary advances in deep learning, performance of pattern recognition has increased significantly in many applications like speech recognition and image recognition, and some systems outperform human-level intelligence in specific domains. Unlike pattern recognition, in this paper, we focus on generating Korean sentences based on a few Korean sentences. We apply variational recurrent auto-encoder (VRAE) and modify the model considering some characteristics of Korean sentences. To reduce the number of words in the model, we apply a word spacing model. Also, there are many Korean sentences which have the same meaning but different word order, even without subjects or objects; therefore we change the unidirectional encoder of VRAE into a bidirectional encoder. In addition, we apply an interpolation method on the encoded vectors from the given sentences, so that we can generate new sentences which are similar to the given sentences. In experiments, we confirm that our proposed method generates better sentences which are semantically more similar to the given sentences.

Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos

Kyung-Min Kim, Jung-Woo Ha, Beom-Jin Lee, Byoung-Tak Zhang

http://doi.org/

Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from ‘Pororo’ cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr