Search : [ author: Hye-In Jung ] (1)

KMSS: Korean Media Script Dataset for Dialogue Summarization

Bong-Su Kim, Ji-Yoon Kim, Seung-ho Choi, Hyun-Kyu Jeon, Hye-Jin Jun, Hye-In Jung, Jung-Hoon Jang

http://doi.org/10.5626/JOK.2024.51.4.311

Dialogue summarization involves extracting or generating key contents from multi-turn documents consisting of utterances by multiple speakers. Dialogue summarization models are beneficial in analyzing content and service records for recommendations in conversation systems. However, there are no Korean dialogue summarization datasets necessary for model construction. This paper proposes a dataset for generative-based dialogue summarization. Source data were collected from the large-capacity contents of domestic broadcasters, and annotators manually labeled them. The dataset comprises approximately 100,000 entries across 6 categories, with summary sentences annotated as single sentences, three sentences, or two-and-a-half sentences. Additionally, this paper introduces a dialogue summary labeling guide to internalize and control data characteristics. It also presents a method for selecting a decoding model structure for model suitability verification. Through experiments, we highlight some characteristics of the constructed data and present benchmark performances for future research.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr