Digital Library[ Search Result ]
COVID-19 Virus Whole-genome Embedding Strategy through Density-based Clustering and Deep Learning Model
Minwoo Pak, Sangseon Lee, Inyoung Sung, Yunyol Shin, Inuk Jung, Sun Kim
http://doi.org/10.5626/JOK.2022.49.4.261
The rapid spread of the COVID-19 throughout the world has made the causative virus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) one of the major targets for research in various fields such as genetics and vaccinology. In particular, studies regarding the phylogeny and subtype properties are of especially great importance due to the variety of subtypes and high variability. However, most computational approaches to studying the viral genome are based on the frequencies of single-nucleotide polymorphisms (SNPs) since the large size of the genomic sequence makes it almost impossible to encode the information of the whole genome at once. In this study, we introduce an alternative embedding strategy to extract information from the SARS-CoV2 whole genome using the density-based clustering algorithm MUTCLUST and deep learning. We first reduced the size of the genome by identifying densely mutated clusters as important regions using MUTCLUST. We then learned the subtype-specific embedding vectors from the extracted clusters using a sequence convolutional deep learning model. We found that the learned embeddings contained information that could be used to discriminate known subtypes and reconstruct phylogenetic trees.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr