An Efficient Document Clustering Method using Space Transformation based on LDA and WMD 


Vol. 48,  No. 9, pp. 1052-1060, Sep.  2021
10.5626/JOK.2021.48.9.1052


PDF

  Abstract

The existing TF-IDF-based document clustering methods do not properly exploit the contextual information of documents, i.e., co-occurence and word-order, and tend to degrade the performance due to the curse of dimensionality. To overcome these problems, the techniques such as a weighted average of word embedding vectors or Word Mover"s Distance (WMD) have been proposed. The performance of the techniques is good at document classification, but not a document clustering that needs to group documents. In this study, we define a document group as a topic document using LDA, the document group"s representative document, and solve the existing problem by calculating the WMD based on the topic document. However, since WMD requires a large amount of computation, we propose a space transformation method that shows a good performance while reducing the computation cost by mapping each document to a low-dimensional space in which each axis means WMD value from each topic document.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

Y. Kim and S. Jung, "An Efficient Document Clustering Method using Space Transformation based on LDA and WMD," Journal of KIISE, JOK, vol. 48, no. 9, pp. 1052-1060, 2021. DOI: 10.5626/JOK.2021.48.9.1052.


[ACM Style]

Yongdam Kim and Sungwon Jung. 2021. An Efficient Document Clustering Method using Space Transformation based on LDA and WMD. Journal of KIISE, JOK, 48, 9, (2021), 1052-1060. DOI: 10.5626/JOK.2021.48.9.1052.


[KCI Style]

김용담, 정성원, "LDA와 WMD 기반의 공간 변환을 이용한 효과적인 문서 클러스터링 방법," 한국정보과학회 논문지, 제48권, 제9호, 1052~1060쪽, 2021. DOI: 10.5626/JOK.2021.48.9.1052.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr