Search : [ author: 손동현 ] (1)

Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation

Seonghwan Choi, Donghyun Son, Hochang Lee

http://doi.org/10.5626/JOK.2021.48.11.1184

Deriving the similarity between two documents, such as, news articles, is one of the most important factors of clustering documents. Sequence similarity models, one of the existing deep-learning based approaches to document clustering, do not reflect the entire context of documents. To address this issue, this paper uses interaction-based and graph-based approaches to construct document pair similarity models suitable for news clustering. This paper proposes four interaction-based models that measures the similarity between two documents through the aggregation of similarity information in the interaction of sentences. The experimental results demonstrated that two out of these four proposed models outperformed SVM and HAN. Ablation studies were conducted on the graph-based model through experiments on the depth of the model’s neural network and its input features. Through error analysis and ensemble of models with an interaction and graph-based approach, this paper showed that these two approaches could be complementarity due to the differences in their prediction tendencies.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr