Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec 


Vol. 44,  No. 7, pp. 742-747, Jul.  2017
10.5626/JOK.2017.44.7.742


PDF

  Abstract

In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

D. Kim and M. Koo, "Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec," Journal of KIISE, JOK, vol. 44, no. 7, pp. 742-747, 2017. DOI: 10.5626/JOK.2017.44.7.742.


[ACM Style]

Dowoo Kim and Myoung-Wan Koo. 2017. Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec. Journal of KIISE, JOK, 44, 7, (2017), 742-747. DOI: 10.5626/JOK.2017.44.7.742.


[KCI Style]

김도우, 구명완, "Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류," 한국정보과학회 논문지, 제44권, 제7호, 742~747쪽, 2017. DOI: 10.5626/JOK.2017.44.7.742.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr