Multi-Level Sequence Alignment : An Adaptive Control Method Between Speed and Accuracy for Document Comparison 


Vol. 41,  No. 9, pp. 728-743, Sep.  2014


PDF

  Abstract

Finger printing and sequence alignment are well-known approaches for document similarity comparison. A fingerprinting method is simple and fast, but it can not find particular similar regions. A string alignment method is used for identifying regions of similarity by arranging the sequences of a string. It has an advantage of finding particular similar regions, but it also has a disadvantage of taking more computing time. The Multi-Level Alignment (MLA) is a new method designed for taking the advantages of both methods. The MLA divides input documents into uniform length blocks, and then extracts fingerprints from each block and calculates similarity of block pairs by comparing the fingerprints. A similarity table is created in this process. Finally, sequence alignment is used for specifying longest similar regions in the similarity table. The MLA allows users to change block’s size to control proportion of the fingerprint algorithm and the sequence alignment. As a document is divided into several blocks, similar regions are also fragmented into two or more blocks. To solve this fragmentation problem, we proposed a united block method. Experimentally, we show that computing document’s similarity with the united block is more accurate than the original MLA method, with minor time loss.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

J. Seo, H. Tak, H. Cho, "Multi-Level Sequence Alignment : An Adaptive Control Method Between Speed and Accuracy for Document Comparison," Journal of KIISE, JOK, vol. 41, no. 9, pp. 728-743, 2014. DOI: .


[ACM Style]

Jong-kyu Seo, Haesung Tak, and Hwan-Gue Cho. 2014. Multi-Level Sequence Alignment : An Adaptive Control Method Between Speed and Accuracy for Document Comparison. Journal of KIISE, JOK, 41, 9, (2014), 728-743. DOI: .


[KCI Style]

서종규, 탁해성, 조환규, "계산속도 및 정확도의 적응적 제어가 가능한 다단계 문서 비교 시스템," 한국정보과학회 논문지, 제41권, 제9호, 728~743쪽, 2014. DOI: .


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr