Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation 


Vol. 48,  No. 11, pp. 1184-1193, Nov.  2021
10.5626/JOK.2021.48.11.1184


PDF

  Abstract

Deriving the similarity between two documents, such as, news articles, is one of the most important factors of clustering documents. Sequence similarity models, one of the existing deep-learning based approaches to document clustering, do not reflect the entire context of documents. To address this issue, this paper uses interaction-based and graph-based approaches to construct document pair similarity models suitable for news clustering. This paper proposes four interaction-based models that measures the similarity between two documents through the aggregation of similarity information in the interaction of sentences. The experimental results demonstrated that two out of these four proposed models outperformed SVM and HAN. Ablation studies were conducted on the graph-based model through experiments on the depth of the model’s neural network and its input features. Through error analysis and ensemble of models with an interaction and graph-based approach, this paper showed that these two approaches could be complementarity due to the differences in their prediction tendencies.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

S. Choi, D. Son, H. Lee, "Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation," Journal of KIISE, JOK, vol. 48, no. 11, pp. 1184-1193, 2021. DOI: 10.5626/JOK.2021.48.11.1184.


[ACM Style]

Seonghwan Choi, Donghyun Son, and Hochang Lee. 2021. Ensemble of Sentence Interaction and Graph Based Models for Document Pair Similarity Estimation. Journal of KIISE, JOK, 48, 11, (2021), 1184-1193. DOI: 10.5626/JOK.2021.48.11.1184.


[KCI Style]

최성환, 손동현, 이호창, "문서 쌍 유사도 판별을 위한 문장 상호 관계 및 그래프 기반 모델의 앙상블," 한국정보과학회 논문지, 제48권, 제11호, 1184~1193쪽, 2021. DOI: 10.5626/JOK.2021.48.11.1184.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr