Journal of KIISE

Search : [ author: 이재구 ] (3)

We proposed a strategy to mitigate the VRAM(Video Random Access Memory) shortage problem encountered when applying 3D Gaussian Splatting in large-scale 3D mapping environments derived from drone footage. To efficiently manage large scale scenes, we partitioned input data, optimized each partition independently, and subsequently merged optimized scenes. Additionally, we introduced a technique to augment point data by considering specific characteristics of drone-captured footage during the optimization process. As a result, our method reduced VRAM usage by two-thirds compared to previous studies, while achieving a 2.5% average improvement in quality as measured by PSNR(Peak Signal-to-Noise Ratio). Our approach emphasizes enhancing the accuracy and quality of 3D reconstructions while minimizing VRAM consumption.

Enhancing Molecular Understanding in LLMs through Multimodal Graph-SMILES Representations

http://doi.org/10.5626/JOK.2025.52.5.379

Recent advancements in large language models (LLMs) have shown remarkable performace across various tasks, with increasing focus on multimodal research. Notably, BLIP-2 can enhance performance by efficiently aligning images and text using a Q-Former, aided by an image encoder pre-trained on multimodal data. Inspired by this, the MolCA model extends BLIP-2 to the molecular domain to improve performance. However, the graph encoder in MolCA is pre-trained on unimodal data, necessitating updates during model training, which is a limitation. Therefore, this paper replaced it with a graph encoder pre-trained on multimodal data and frozen while training the model. Experimental results showed that using the graph encoder pre-trained on multimodal data generally enhanced performance. Additionally, unlike the graph encoder pre-trained on unimodal data, which performed better when updated, the graph encoder pre-trained on multimodal data achieved superior results across all metrics when frozen.

A Survey of Advantages of Self-Supervised Learning Models in Visual Recognition Tasks

Euihyun Yoon, Hyunjong Lee, Donggeon Kim, Joochan Park, Jinkyu Kim, Jaekoo Lee

http://doi.org/10.5626/JOK.2024.51.7.609

Recently, the field of teacher-based artificial intelligence (AI) has been rapidly advancing. However, teacher-based learning relies on datasets with specified correct answers, which can increase the cost of obtaining these correct answers. To address this issue, self-supervised learning, which can learn general features of photos without needing correct answers, is being researched. In this paper, various self-supervised learning models were classified based on their learning methods and backbone networks. Their strengths, weaknesses, and performances were then compared and analyzed. Photo classification tasks were used for performance comparison. For comparing the performance of transfer learning, detailed prediction tasks were also compared and analyzed. As a result, models that only used positive pairs achieved higher performance by minimizing noise than models that used both positive and negative pairs. Furthermore, for fine-grained predictions, methods such as masking images for learning or utilizing multi-stage models achieved higher performance by additionally learning regional information.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Aspect-Based Comparative Summarization with Large Language Model

Enhancing Molecular Understanding in LLMs through Multimodal Graph-SMILES Representations

A Survey of Advantages of Self-Supervised Learning Models in Visual Recognition Tasks

Search

Editorial Office