Digital Library[ Search Result ]
Hierarchical Semantic Prompt Design for Robust Open-Vocabulary Object Detection
http://doi.org/10.5626/JOK.2025.52.6.499
Open-Vocabulary Object Detection (OVOD) has been proposed to overcome the limitation of traditional object detection methods, which are restricted to recognizing only categories seen during training. While conventional OVOD approaches generate classifiers using simple prompts like “a {category}”, this paper incorporates the hierarchical structure of object categories into prompts to enhances detection performance. Specifically, we applied prompt engineering techniques that could reduce the use of lengthy connectives and place important keywords at the beginning of the sentence. This resulted in more effective prompts that could capture the intrinsic meaning of hierarchical information. Our method allows for the generation of classifiers without additional computational resources or retraining. Furthermore, it demonstrates strong generalizability. It can be applied to other tasks such as image captioning and medical image analysis. By leveraging hierarchical expressions familiar to humans, our approach also contributes to improving the interpretability of model outputs.
Improved Performance of Multi-Modal Audio-Visual Segmentation with Noise
http://doi.org/10.5626/JOK.2025.52.2.101
Multi-modal-based object segmentation using audio and visual information is a topic that is currently being actively studied in the field of computer vision. Audio-Visual Segmentation (AVS) is an audio-visual multi-modal object segmentation method proposed to allow only objects that make sounds in visual information to be segmented in pixel units by additional audio information. These technologies are important for applications that require accurate object recognition, such as robot recognition and autonomous driving. When collecting information from the real world, unwanted information can be included. Noise can also occur due to mechanical defects, which can significantly degrade the performance of the AVS model. In this paper, it was confirmed that the addition of noise to audio and visual could reduces the performance. The necessity of a robust AVS study to cope with it was also confirmed. Therefore, this study can improve the problem of performance degradation even when noise is added by adding a network that can removes noise.
Single-Modal Pedestrian Detection Leveraging Multimodal Knowledge for Blackout Situations
http://doi.org/10.5626/JOK.2024.51.1.86
Multispectral pedestrian detection using both visible and thermal data is an actively researched topic in the field of computer vision. However, the majority of the existing studies have only considered scenarios where the camera operates without challenges, leading to a significant decline in performance when a camera blackout happens. Recognizing the importance of addressing the camera blackout challenge in multispectral pedestrian detection, this paper researched models that remain robust even during camera blackouts. Our model, proposed in this study, utilizes the Feature Tracing Method during training phase to apply the knowledge from multiple modalities to single-modal pedestrian detection. Even if the camera experiences a blackout and only one modality is input, the model predicts and operates as if it"s using multiple modalities. Through this approach, pedestrian detection performance in blackout situations is improved.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr