Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure 


Vol. 46,  No. 5, pp. 440-447, May  2019
10.5626/JOK.2019.46.5.440


PDF

  Abstract

In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

J. An and Y. Ko, "Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure," Journal of KIISE, JOK, vol. 46, no. 5, pp. 440-447, 2019. DOI: 10.5626/JOK.2019.46.5.440.


[ACM Style]

Jaehyun An and Youngjoong Ko. 2019. Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure. Journal of KIISE, JOK, 46, 5, (2019), 440-447. DOI: 10.5626/JOK.2019.46.5.440.


[KCI Style]

안재현, 고영중, "K-means 클러스터링 방법과 유사도 측정 기반의 채팅 말뭉치 반자동 확장 방법," 한국정보과학회 논문지, 제46권, 제5호, 440~447쪽, 2019. DOI: 10.5626/JOK.2019.46.5.440.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr