Digital Library[ Search Result ]
KMSS: Korean Media Script Dataset for Dialogue Summarization
Bong-Su Kim, Ji-Yoon Kim, Seung-ho Choi, Hyun-Kyu Jeon, Hye-Jin Jun, Hye-In Jung, Jung-Hoon Jang
http://doi.org/10.5626/JOK.2024.51.4.311
Dialogue summarization involves extracting or generating key contents from multi-turn documents consisting of utterances by multiple speakers. Dialogue summarization models are beneficial in analyzing content and service records for recommendations in conversation systems. However, there are no Korean dialogue summarization datasets necessary for model construction. This paper proposes a dataset for generative-based dialogue summarization. Source data were collected from the large-capacity contents of domestic broadcasters, and annotators manually labeled them. The dataset comprises approximately 100,000 entries across 6 categories, with summary sentences annotated as single sentences, three sentences, or two-and-a-half sentences. Additionally, this paper introduces a dialogue summary labeling guide to internalize and control data characteristics. It also presents a method for selecting a decoding model structure for model suitability verification. Through experiments, we highlight some characteristics of the constructed data and present benchmark performances for future research.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr