COVID-19 Virus Whole-genome Embedding Strategy through Density-based Clustering and Deep Learning Model 


Vol. 49,  No. 4, pp. 261-270, Apr.  2022
10.5626/JOK.2022.49.4.261


PDF

  Abstract

The rapid spread of the COVID-19 throughout the world has made the causative virus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) one of the major targets for research in various fields such as genetics and vaccinology. In particular, studies regarding the phylogeny and subtype properties are of especially great importance due to the variety of subtypes and high variability. However, most computational approaches to studying the viral genome are based on the frequencies of single-nucleotide polymorphisms (SNPs) since the large size of the genomic sequence makes it almost impossible to encode the information of the whole genome at once. In this study, we introduce an alternative embedding strategy to extract information from the SARS-CoV2 whole genome using the density-based clustering algorithm MUTCLUST and deep learning. We first reduced the size of the genome by identifying densely mutated clusters as important regions using MUTCLUST. We then learned the subtype-specific embedding vectors from the extracted clusters using a sequence convolutional deep learning model. We found that the learned embeddings contained information that could be used to discriminate known subtypes and reconstruct phylogenetic trees.


  Statistics
Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.


  Cite this article

[IEEE Style]

M. Pak, S. Lee, I. Sung, Y. Shin, I. Jung, S. Kim, "COVID-19 Virus Whole-genome Embedding Strategy through Density-based Clustering and Deep Learning Model," Journal of KIISE, JOK, vol. 49, no. 4, pp. 261-270, 2022. DOI: 10.5626/JOK.2022.49.4.261.


[ACM Style]

Minwoo Pak, Sangseon Lee, Inyoung Sung, Yunyol Shin, Inuk Jung, and Sun Kim. 2022. COVID-19 Virus Whole-genome Embedding Strategy through Density-based Clustering and Deep Learning Model. Journal of KIISE, JOK, 49, 4, (2022), 261-270. DOI: 10.5626/JOK.2022.49.4.261.


[KCI Style]

박민우, 이상선, 성인영, 신윤열, 정인욱, 김선, "밀도기반 군집화와 딥러닝 모델을 이용한 COVID-19 바이러스 전장 유전체 임베딩 전략," 한국정보과학회 논문지, 제49권, 제4호, 261~270쪽, 2022. DOI: 10.5626/JOK.2022.49.4.261.


[Endnote/Zotero/Mendeley (RIS)]  Download


[BibTeX]  Download



Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr